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THE PERSONALITY AND ACHIEVEMENT OF 
THE CLASSROOM PARTICIPANT 


HENRY CLAY SMITH 


Department of Psychology, Michigan State College 
AND 
DONALD 8S. DUNBAR 


Does a student gain more from a class when he voluntarily 
participates in discussions? The purpose of this study was to 
attempt to answer the question as applied to an introductory 
psychology course. A student’s participation was measured by 
counting the number of times he spoke in class during the year. 
Improvements in personal adjustment and critical thinking were 
estimated by tests given at the beginning and end of the course. 
Improvements were then related to the amount of the student’s 
participation. 

The authors’ attention was attracted to the problem by recent 
studies pointing out the value of participation to both the student 
and the client. Rogers’? has systematized a treatment which 
stresses the client’s participation. Roethlisberger’* has devel- 
oped a similar approach for industrial workers. In a broader 
context, Allport? has outlined a ‘psychology of participation.’ 
Impressed by the effect of such studies on teaching, Symonds"® 
devoted his presidential address to distinguishing between the 
roles of teacher and therapist. 

As one phase of this trend, numerous psychology courses have 
been organized to increase the student’s participation. The 
Human Relations course at Colgate is typical of the general 
approach used in such courses. Berrien describes this course as 
proceeding “‘ with as little direction from the instructor as possible. 
He avoids expressing his own opinions.’’ Cantor’ has written 
more fully concerning his similar course. 

Such courses emphasize the improvements in objective think- 
ing, self-insight, and personal adjustment which arise from dis- 
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cussing concrete problems in a permissive, student-centered 
atmosphere. Berrien,* for example, states that his course aims 
“to develop an attitude and a way of thinking rather than a 
technical vocabulary.” The courses of Berrien,* Cantor,® 
Schwebel and Asch,'4 and Snyder” differ somewhat in details. 
However, they all lean heavily upon the assumptions and tech- 
niques of nondirective counseling as outlined by Rogers. !? 

It is found in nondirective counseling that the client does most 
of the talking. Rogers!* reports that the ratio of counselor to 
client talk furnishes one of the ‘‘sharpest differences”’ between the 
directive and client-centered situations. He adds that in ‘‘non- 
directive counseling the client comes to talk out his problems”’ 
whereas in directive counseling ‘‘the counselor talks to the client.’’ 
Applied to teaching, this has meant that the nondirective teacher 
has emphasized the participation of the student in discussions and 
has minimized or completely rejected lecturing. The assump- 
tion appears to be that this participation promotes student 
improvement. 

It is this participation aspect of client-centered practice which 
has been most readily adapted to classroom use. Although the 
nondirective courses mentioned above have varied considerably 
in the amount of responsibility each has granted the student in 
determining content and grades, they have all been in accord on 
the necessity and desirability of student participation. As will 
become clear later, the course described in this report is also a 
compromise between the ideal student-centered and traditional 
teacher-directed class. Therefore, the present study is not 
designed to test directly the effectiveness of student-centered 
teaching. Rather, it is intended to test the validity of applying 
certain aspects of student-centered teaching to an introductory 
course in psychology. This is not to suggest, of course, that a 
direct test of the effectiveness of truly student-centered teaching 
is not needed. Thus far, reported attempts to experimentally 
verify the effectiveness of nondirective teaching methods,” !*: 8 
have yielded quantitative results varying from negative to 
inconclusive. 

Specifically, this study was made to test the hypothesis that: 
Students who participate in class discussion improve their 
critical thinking, personal adjustment, and understanding 
of others more than those who do not participate. 
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It was hoped that an experimental attack in a specific situation 
would clarify the general problem of the relationship between 
participation and effective group behavior. 


METHOD 


Subjects.—The subjects in this experiment were one hundred 
eighteen college students in a two-semester introductory course 
in a small liberal arts college. Fifty-five per cent were sopho- 
mores, thirty per cent were juniors, and fifteen per cent were 
seniors. The course was not open to freshmen. The subjects 
were divided into four sections, which are referred to in this 
study as C, D, E, and G. The sections varied in numbers from 
twenty-seven to thirty-one. One author taught sections C and 
D, the other, sections E and G. 

Course-—The course was divided into four major units: 
Introduction to Problem-solving, Problems of Personality, Prob- 
lems Concerning the Raw Materials of Personality, and Problems 
Concerning the Formative Influences of Personality. Munn’s 
General Psychology was utilized as the primary text. A course 
outline which included class and reading assignments was dis- 
tributed at the beginning of the course. Classes were devoted 
about equally to the administration and discussion of psycho- 
logical tests, the discussion of case problems, and the discussion of 
psychological facts and principles. Objective examinations 
provided the sole basis for grades in the first semester. Objec- 
tive and essay questions and an autobiography furnished the 
basis for grades in the second semester. 

In the classroom, efforts were made to encourage voluntary 
student participation. The instructors emphasized its value in 
achieving the objectives of the course. They presented psycho- 
logical test results, case studies, and psychological problems of 
interest to the students for discussion. As an aid in achieving 
participation, nondirective counseling was discussed and some of 
the techniques of the nondirective approach were employed in the 
classroom. The instructors accepted, reflected, and only occa- 
sionally offered interpretations of a student’s contributions. 
Neither praise nor criticism were employed. 

However, the course as a whole was only partially nondirective: 
students were not given the responsibility for selecting the general 
subject for discussion, for deciding on course procedures, or for 
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determining their grade as is common in nondirective courses. 
The classes in the course are best characterized as ‘free-discus- 
sion’ ones. In other words, the over-all structure of the course 
and the classes was directive in nature but participation was 
handled as it would be in a nondirective course. 

The amount of participation varied greatly from class to class. 
For the entire course and for all sections the median participation 
per student per class was .50. In other words, the typical student 
participated once every two classes and the typical fifty-minute 
session included approximately fifteen voluntary contributions by 
the students. 

Measurement of Participation.—Four student assistants (one in 
each section) recorded the number of times that each student 
participated in each class throughout the year. A ‘participation’ 
was defined as any voluntary verbal response made by a student 
regardless of its content or itslength. The assistants tallied their 
results on prepared sheets without the knowledge of the students. 
The few who inquired about the assistant’s activities were told 
that he was checking attendance. A reliability check, in the 
form of independent recordings of the same session by two observ- 
ers, was tried on a sample of classes and approached one hundred 
per cent agreement. The consistency of a student’s level of par- 
ticipation was high: rank order correlations by sections between 
student participation in the first semester and participation in the 
second semester ranged from .75 to .85. A student’s participa- 
tion score was determined by dividing his total number of 
participations in the two semesters by the number of days he was 
present. This was the score used in calculating the results. 

Test Program.—Table I shows the tests employed and the 
months in which they were given. This test program was 
designed primarily to measure progress toward the objectives 
of the course. These objectives, adapted from Wolfle,'* were to: 
acquaint the student with the most important facts, principles, 
and hypotheses of psychology; develop his skill in the critical 
analysis of psychological problems; and improve his ability to 
achieve socially desirable solutions of his personal problems. The 
achievement tests were used as a measure of mastery of psycho- 
logical facts and principles. The Case of Barry Black measured 
his ability to apply the principles to a concrete case. The 
Watson-Glaser Test was utilized as a measure of critical thinking. 
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TABLE I.—EXPERIMENTAL TEST PROGRAM 





1948 1949 





Tests 
Sept.} Oct. | Nov.| Dec. | Jan. | May 





Case of Barry Black (Hor- 


rocks) x x(d,e)} x 
Adjustment Inventory 

(Bell) x x(¢,g)| x 
Security-Insecurity Inven- 

tory (Maslow) x x(¢,g)} x 
Critical Thinking Test 

(Watson-Glaser) x x 


Personality Inventory 
(Bernreuter) x 
Ascendancy-Submission 
Test (Allport) x 
Intelligence Test (Henmon- 
Nelson) x 


Course Attitude Scale x 
Course Achievement Tests x x x x 























The Bell and Maslow Inventories were used as a measure of 
success in solving personal problems. Since the objectives of the 
course were discussed with the students, the test program was 
meaningful and interesting to them. 

The first tests in Table I are those which were given more than 
once. The Case of Barry Black, a standardized case study with 
diagnostic, remedial, and total scores, is discussed by Horrocks 
and Troyer.'?’ The Bell Adjustment Inventory consists of one 
hundred forty questions yielding measures of family, health, 
social, emotional, and total adjustment. The Maslow Inventory, 
a clinically validated inventory yielding a measure of security 
feelings, is described by Maslow.* All students took these three 
tests in September and again in May. Also, in January, the Case 
of Barry Black was given to sections D and E, and the Inventories 
were given to sections C and G. The purpose of this was to meas- 
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ure the amount of improvement after one, as well as after two, 
semesters of the course. The Watson-Glaser Test includes sec- 
tions on generalizations, inferences, recognition of arguments, and 
assumptions. It was given only in September and May. 

The tests in the middle of Table I were given only once. These 
tests were the Bernreuter Personality Inventory, the Allport 
Ascendancy-Submission Test, and the Henmon-Nelson Intelli- 
gence Test. They gave measures of self-confidence, social 
independence, dominance, and intelligence. The results of these 
tests were used in determining the personality characteristics of 
the students who eventually became high participants in the 
course. 

The authors devised the course attitude scale and the course 
achievement tests listed in the last section of Table I. The 
attitude scale consisted of twenty statements about the course 
ranging from those which were very favorable to those which were 
very unfavorable. In January the students were asked to rate 
each of these statements on a five point scale from strong agree- 
ment to strong disagreement. The theoretical minimum and 
maximum scores on the scale were 20 and 100, the actual, 29 and 
91. The achievement tests were designed to measure the ability 
to apply psychological facts and principles to the objectives of 
the course, but no validity studies were made. They consisted 
of one hundred fifty to three hundred true-false and multiple 
choice items. 

RESULTS 


These results are organized to consecutively answer the follow- 
ing questions: 

1) How can ‘participants’ and ‘nonparticipants’ be clearly 
differentiated? (See Figure 1) 

2) Did the participants and nonparticipants, thus defined, 
differ in aptitude, personality, and achievement at the beginning 
of the course? (See Table II) 

3) Did the class as a whole improve in their critical thinking, 
personality adjustment, and ability to deal with a concrete 
psychological problem during the course? (See Table III) 

4) Did the participants improve more than the nonpartici- 
pants? (See Table IV) 

Pattern of Participation.—Figure 1 shows the per cent of stu- 
dents with various amounts of participation. From this figure 
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it may be seen that 34.7 per cent of the students participated less 
than .3 times per class, or less than three times in ten meetings 
of the class. This group was defined as the ‘nonparticipants.’ 
At the other extreme, a total of 27.1 per cent of the students had 
an average participation of once each meeting or greater. The 
highest participant had an average of almost four voluntary 
contributions per meeting throughout the year. This group with 
one or more participations per meeting was defined as the 
‘participants.’ 

These definitions were, of course, somewhat arbitrary. How- 
ever, they sharply differentiate the two groups. Although the 
two groups together include more than sixty per cent of the 
students, the median of the participants was almost twenty times 
as great as the median of the nonparticipants. This was due to 
the extremely skewed nature of the distribution. 

This skewed distribution was unexpected. Allport! has indi- 
cated that such J-curves are characteristic of conformity behavior. 
This, in turn, suggested the hypothesis that voluntary classroom 
participation is a nonconforming type of behavior. The data in 
the present study were examined for their bearing on this 
hypothesis. 

As an initial check on this hypothesis, the students were divided 
into those who belonged to a fraternity and those who did not. 
Approximately eighty-five per cent of the student body belonged 
to a fraternity. It was therefore assumed that joining a frater- 
nity was conforming behavior. If the hypothesis is correct, the 
fraternity members should be lower in participation. The results 
of the analysis support the hypothesis: the median participation of 
the nonfraternity group (N-16) was 1.45 and of the fraternity 
group (N-102) was .39. In other words, the nonfraternity 
students participated almost four times as frequently as the frater- 
nity members. This difference was found to be significant at the 
one per cent level of confidence. 

The relatively high consistency of participation throughout the 
course is also of some relevance to the hypothesis if we assume that 
conformity-nonconformity is a consistent personality variable. 
As previously mentioned, the rank order correlations by sections 
for participation between the first and second semester ranged 
from .75 to .85. The product moment correlation for all students 
between participation in the first semester and participation in the 
second semester was .67. 
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TABLE II.—INITIAL DIFFERENCES BETWEEN PARTICIPANTS AND 





























NONPARTICIPANTS 
Partet- Nonp at | Difference 
pants ticipants 
Tests 
. , | Conf. 
N | Mean | N | Mean} P-N 
Level 
Barry Black: 
Diagnosis 32) 74.1) 41) 58.2) 15.9) .007 
Remedial 32) 57.2) 41) 46.1] 11.1] .05 
Total 32) 131.3) 41) 104.3) 27.0) .002 
Adjustment: 
Family 30; 7.6) 36; 6.8) —.8\Insign. 
Health 30) 7.9) 36) 8.1 . 2\Insign. 
Social 30, 6.1) 36} 12.5) 6.4) .0001 
Emotional 30} 8.2) 36; 9.4] 1.2)Insign. 
Total 30; 29.8) 36; 36.8) 7.0) .05 
Security-Insecurity: 32) 21.9) 36) 17.9) 4.0)Insign. 
Critical Thinking: 
Generalizations 32) 15.9) 41) 15.6 .3|Insign. 
Inferences 32} 35.9) 41) 33.1) 2.8 .O1 
Arguments 32} 78.2) 41) 71.4) 6.8) .0001 
Assumptions 32} 15.3) 41) 14.0) 1.3) .05 
Personality: 
Self-Confidence 30) —73.0) 41/—16.3) 56.7} .003 
Social Independ. 14, 1.4) 21;—40.5) 41.9) .05 
Ascendancy 32} 12.4) 40) 1.8) 10.6) .O1 
Intelligence 30} 64.3) 41) 59.5) 4.9) .02 
Course Attitude: 26} 59.5) 33] 62.7/—2.8/Insign. al | 
‘ 
Course Achievement (Oct.) | 31} 102.7; 41) 92.3} 10.4) .01 
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Differences between Participants and Nonparticipants at the 
Beginning of the Course-—More evidence in favor of the con- 
formity hypothesis is revealed by contrasting initial differences 
between participants and nonparticipants in personality and 
achievement. Table II compares the performance of participants 
and nonparticipants on the first administrations of all the tests 
employed. A plus sign in the difference column indicates greater 
achievement, more desirable scores on the personality variables, or 
better adjustment on the part of the participants. The confi- 
dence levels in the last column to the right are based on critical 
ratios. They are to be interpreted as the probability of a differ- 
ence as great or greater than that actually obtained occurring by 
chance. In this and other tables confidence levels at ten per 
cent or higher are labelled ‘insignificant.’ 

In regard to the various measures of ability, these results show 
that the participants were always higher than the nonpartici- 
pants. They were significantly superior in the three parts of the 
Case of Barry Black, in the Henmon-Nelson Intelligence Test, in 
the course achievement test, and in three out of four sections of 
the Critical Thinking Test. Only in the test of generalizations 
was their superiority insignificant. 

The most significant superiority of the participants was in the 
recognition of arguments section of the Critical Thinking Test. 
The difference was significant at the .0001 level of confidence. 
This test consists of a series of issues with five statements after 
each. The subject is asked to indicate those statements which, if 
true, would be ‘Strong’ support for the issue and those statements 
which, if true, would indicate only ‘Weak’ support for the issue. 
The ability to discriminate weak from strong arguments, in other 
words, was the ability which most sharply discriminated the 
participants from the nonparticipants. This test was given in the 
second week of the course. 

The personality test results show the same trend. The partici- 
pants began the course significantly better adjusted socially and 
significantly more self-confident, ascendant, and socially inde- 
pendent. They were insignificantly better adjusted in the health 
and emotional areas and were slightly more secure. The only 
superiority of the nonparticipants was in family adjustment, but 
this was statistically insignificant. 

In the personality area, the most significant superiority of the 
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nonparticipants was in social adjustment (.0001 level of confi- 
dence). According to the Bell manual, individuals scoring high 
on this scale ‘‘tend to be submissive and retiring in their social 
contacts”’ while individuals scoring low ‘‘are aggressive in social 
contacts.”” The participants, of course, were low in this scale. 
From one point of view, these results are merely a validation of 
the social adjustment scale since the participants were showing 
aggressiveness in one kind of social situation. The social adjust- 
ment scale consisted of thirty-two statements. Among these 
were the following: Do you often hesitate to speak out in a group 
lest you say and do the wrong thing? Do you find it very diffi- 
cult to speak in public? Would it be difficult for you to give an 
oral report before a group of people? 

Contrary to expectations, the nonparticipants, as measured by 
the course attitude scale, were more interested in the course than 
the participants. The difference was not statistically significant, 
but the direction is of interest. It is possible that the partici- 
pants, because of their higher initial achievement and generally 
superior ability, found the course somewhat less challenging than 
the nonparticipants. They may also have been less interested in 
the objectives of the course for similar reasons. At any rate, 
participation was no index of interest. 

It is evident from these results that the students who become 
participants were significantly different at the beginning of the 
course from those who became nonparticipants. These differ- 
ences are consistent with the hypothesis that voluntary participa- 
tion was nonconforming behavior. The personality factors 
related to participation would make it easy for the student pos- 
sessing them to be a nonconformist. The participant, in com- 
posite, was the intelligent, well-informed student with high 
self-confidence, social independence, dominance, and social 
aggressiveness. 

General Improvement During the Course.—Table III shows the 
average improvements made during the course by all subjects. 
These results are given for one and for two semesters of the 
course. The difference between improvements in one and two 
semesters are indicated. The test data upon which this analysis 
is based is suggested by Table I. The Critical Thinking Test was 
given only at the beginning and the end of the year so that the 
analysis on this test is given for just two semesters. Confidence 
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TABLE II].— DIFFERENCES IN IMPROVEMENT MADE IN ONE AND 


Two SEMESTERS 




















S Improve- Differ- 
cores 
ments ences 
Tests 
-_ ; Conf. | 2 sm-} Conf. 
N | Initial | Final | F-I tue | bam t Raw 
Barry Black: 
Diagnosis 
Two Semes. 38 | 66.9 | 89.2 | 22.3 | .001 
One Semes. 50 | 60.7 | 82.8 | 22.1] .001 .2\Insign. 
Remedial 
Two Semes. 38 | 51.8 | 59.7 | 7.9] .01 
One Semes. 50; 51.0}; 54.9 | 3.9 |Insign.| 4.0/Insign. 
Total 
Two Semes. 38 | 118.7 | 148.9 | 30.2 | .0001 
One Semes. 50 | 111.7 | 137.7 | 26.0 | .0005) 4.2/Insign. 
Adjustment: 
Family 
Two Semes. 36 5.3 5.8 | —.5 |Insign. 
One Semes. 42 6.3 5.5 .8 |Insign.| —1.3\Insign. 
Health 
Two Semes. 36 7.1 6.0} 1.1] .03 
One Semes. 42 7.2 5.9} 1.3] .01 — .2\Insign. 
Social 
Two Semes. 36 8.0 6.5 | 1.5] .05 
One Semes. 42 10.0 6.9 |} 3.1] .001 | —1.6)Insign. 
Emotional 
Two Semes. 36 7.7 7.6 .1 |Insign. 
One Semes. 42 8.6 6.2 | 2.4] .001 | —2.3/Insign. 
Total 
Two Semes. 36 | 28.1] 25.9 | 2.2 |Insign. 
One Semes. 42} 32.3 | 25.1| 7.3 | .0001) —5.1] .03 
Security-Insecurity 
Two Semes. 34 | 27.4] 39.2 | 11.8 [|Insign. 
One Semes. 46 3.9} 56.0} 52.1} .001 |—40.3) .09 
Critical Thinking. (Two 
Semes.) 
Generalizations 86 15.6] 16.2 .6 | .01 
Inferences 86 | 34.3 | 35.1 .8 |Insign. 
Arguments 86 | 72.6 | 73.6 | 1.0 |Insign. 
Assumptions 83 |} 14.6] 15.3 .7 | .02 
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levels were based on the mean and standard error of differences 
between initial and final scores. 

In general, out of twenty-two comparisons, twenty-one showed 
improvement during the course. Of the twenty-one improve- 
ments, fourteen were at or below the five per cent level of confi- 
dence. The most significant improvements were made on the 
Case of Barry Black. Here both the one- and two-semester 
groups were at or below the one per cent level of confidence. The 
single decrement was an insignificant decrease for the two 
semester group in family adjustment. 

The two-semester group improved insignificantly more than 
the one-semester group on the Case of Barry Black. However, 
the one-semester group benefited more from practice effects and 
from lower initial scores. The two-semester group improved less 
than the one-semester on all parts of the Bell and on the security 
test. The superiority of the one-semester group was significantly 
greater for the total score on the Bell. However, the one- 
semester group was initially less well adjusted in all areas and less 
secure. It is likely that if the groups could have been equated 
for initial scores and for practice effects, the superior improve- 
ment of the two-semester group would have been greater on the 
Case of Barry Black and its inferior improvement on the adjust- 
ment scales would have been less different from the one-semester 
group. On the whole, there is no real evidence to show that two 
semesters of the course had any more influence than one semester. 

Incidentally, there was no evidence of a difference in honesty 
as far as the two groups were concerned. Inserted in the Maslow 
Inventory was an adaptation of the fifteen ‘L’ (lie) items from 
the Minnesota Multiphasic Test. The one- and two-semester 
groups had exactly the same average number of positive responses 
on these items (2.8) initially, and both groups decreased their 
scores on those items slightly in the final administration. Dis- 
cussion with students at the end of the course showed no aware- 
ness of the nature or purpose of these items. 

Since no control classes were employed, the generally significant 
improvements do not establish the effectiveness of the course. 
However, the improvements do provide some basis for expecting 
differences between participants and nonparticipants. If no 
improvements had occurred, a comparison between participant 
and nonparticipants could only have proceeded on the assumption 
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that one of these groups would increase while the other decreased, 
the changes balancing each other. 

Comparative Improvements of Participants and Nonparticipants. 
The primary purpose of this study was to compare the relative 
improvements of participants and nonparticipants. It has been 
shown, however, that the participants and nonparticipants were 
distinctly different groups at the beginning of the course. In 
order to make valid comparisons, therefore, it was necessary to 
match participants and nonparticipants on the basis of their 
initial scores. This was done for all tests which were administered 
at least twice, as well as for the course achievement tests given at 
the end of the first and second semesters. In the case of the 
achievement tests, the subjects were matched on the basis of the 
objective examination given in October. In addition to being 
matched for initial scores, the participants and nonparticipants 
were matched for section and for the number of times they had 
taken a test. Because of the initial superiority of the participant 
group, the number of cases was severely limited in some compari- 
sons. The results of this analysis are given in Table IV. 

The data give very little support to the hypothesis that those 
who participate in classroom discussion improve more than those 
who donot. The participants improved significantly more than 
the nonparticipants in only one of the fifteen comparisons made. 
Considering only the direction of the differences, ten favored the 
participants, five the nonparticipants. 

The somewhat arbitrary division of these results into improve- 
ments in ‘intellectual areas’ and improvements in ‘adjustment 
areas’ is suggestive. Of nine possible comparisons in the intel- 
lectual area (three in Barry Black, four in critical thinking, and 
two in course achievement) the participants were superior in 
eight. On the other hand, of six possible comparisons in the area 
of adjustment (five in the Bell Adjustment and one in Maslow 
Inventory) the nonparticipants were superior in four. In other 
words, this classification suggests that the participants may have 
improved more in the intellectual area but certainly did not in 
the adjustment area. This latter result is of importance since it is 
in this area of adjustment that nondirective reports have indi- 
cated the most important improvements might be found. 

By way of caution it should be pointed out that though the 
cases were drawn from the extremes of the distribution and were 
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TABLE I1V.—DIFFERENCES IN IMPROVEMENT FROM SEPTEMBER 
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NONPARTICIPANTS 
Im- 
Scores prove-| Differences 
Tests ments 
May- Confid. 
N | Sept. | May Sept. P-N aaa 
Barry Black: 
Diagnosis 
Part. 14 62.2 | 92.4 | 30.2 
Nonp. 14 62.6 | 85.9 | 23.3 6.9 | Insign. 
Remedial 
Part. 17 | 50.5 | 62.2} 11.7 
Nonp. 17 | 50.5 | 64.5] 14.0 —2.3 | Insign. 
Total 
Part. 15 | 113.3 | 157.8 | 44.5 
Nonp. 15 | 113.4 | 150.8 | 37.4 7.1 | Insign. 
Adjustment: 
Family 
Part. 17 5.5 5.8 | —.3 
Nonp. 17 5.5 5.2 3 — .6 | Insign. 
Health 
Part. 16 6.6 5.8 8 
Nonp. 16 6.7 6.3 4 .4 | Insign. 
Social 
Part. 11 6.9 7.3 | —.4 
Nonp. 11 6.9 6.1 8 —1.2 | Insign. 
Emotional 
Part. 18 8.9 7.4 1.5 
Nonp. 18 9.0 7.8 1.2 .3 | Insign. 
Total 
Part. 17 30.5 28.6 1.9 
nam, 17; 30.5; 25.7) 4.8 —2.9 | Insign. 
Security-Insecurity: 
Part. 15 | 47.5 | -77.8 | 30.3 : 
Nonp. 15 | 50.1 | 104.9 | 54.8 | —34.5 | Insign. 
Watson-Glaser 
Generalizations: 
Part. 18... 15.4 16.8 1.4 
Nonp. 18 15.4 15.5 1 1.3 .01 
Inferences 
Part. 19 | 34.4] 36.1 B. 
Nonp. 19 | 34.4] 33.6] —.8 2.5 .08 
Arguments 
Part. 11 73.8 | 75.9} 2.1 
Nonp. ll | 73.8} 75.5] 1.7 .4 | Insign. 
Assumptions 
Part. 14 14.3 15.8 1.5 
Nonp. 14 14.3 14.2; —.1 1.6 .07 
Course Achievement: 
First Semester 
Part. 25 98.4 | 171.1 
Nonp. 25} 98.5 | 168.0 3.1 | Insign. 
Second Semester 
Part. 24 | 97.6) 109.8 
Nonp. 24 | 97.6 | 107.1 2.1 | Insign. 
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carefully matched, their number was small. Furthermore, the 
reliabilities and validities of the tests employed are open to 
question. However, there is no evidence that the nonparticipants 
were less honest. With participants and nonparticipants 
matched on the basis of their initial ‘L’ score, the nonparticipants 
gave an insignificantly more honest response on the second 


administration of the test. 


DISCUSSION OF RESULTS 


The most signicant findings in this study were those supporting 
the hypothesis that voluntary participation was nonconforming 
behavior: the skewed distribution of participation, the greater 
participation of nonfraternity members, and the personality 
characteristics of the participants. These facts were unexpected. 
They require clarification before the general significance of this 
study for the initial hypothesis can be fully understood. 

How far can the hypothesis of the nonconforming nature 
of voluntary participation be generalized? It seemed amply 
demonstrated in the present study. Does it hold in other classes 
of the same institution? Does it hold in institutions with differ- 
ent social atmospheres? Some additional observations and evi- 
dence are available regarding these questions. 

While the present study was being conducted, one of the au- 
thors conducted a course in the Psychology of Personality in the 
nondirective manner. The students were given full responsibility 
for the conduction of the course, including the determination of 
classroom procedures and grades. It was observed that a few 
individuals dominated the classes; one of the two sections was 
dominated by a very intelligent, premedical senior who severely 
criticized the contributions of others. Nonparticipating students 
stated in interviews that they did not participate because of the 
feeling that their classmates were critical and nonpermissive. 
These observations, though unsystematic, are entirely consistent 
with the findings of the present study. 

In a better controlled check, one of the authors repeated the 
measurements of participation in an introductory course with 
thirty students. It was conducted in a very similar manner but 
in a quite different setting: a six-week summer session in a large 
public coeducational college. An analysis of the participation 
data for the six weeks showed an almost identical distribution to 
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that found in the present study. Using the categories of ‘non- 
participant’ (0-.2), ‘doubtful’ (.3-.9), and ‘participant’ (1.0-up) 
the summer school group showed forty, forty-three, and seven- 
teen per cent, respectively, in these categories. ‘The comparable 
figures in the present study were thirty-five, thirty-eight, and 
twenty-seven. In other words, the summer school group showed 
an even more skewed distribution with the implication, again, that 
voluntary participation was nonconforming behavior. 

In this check, only the Henmon-Nelson and achievement tests 
were available for analysis. Contrary to the findings of the 
present study, the rank order correlations between these tests and 
participation were approximately zero. The inference, then, is 
that the social control of participation existed in this different 
setting but that the correlates of participation varied. It seems 
probable that personality variables such as ascendancy and 
self-confidence are more basic correlates than aptitude and 
achievement. 

This study was designed to test the hypothesis that participants 
gain more from a course than nonparticipants. It is only weakly 
and inconsistently supported by the results. The participants 
may have improved slightly more in critical thinking, course 
achievement, and understanding of a case problem. However, 
they did not show greater gains in personal adjustment. Even 
the slight gains may not be causally related to participation, since 
participation may have been merely a symptom of greater 
academic motivation. 

These results conflict with the widely held belief in the value of 
participation. However, they are consistent with experimental 
findings from quite different sources. Remmers'! found no 
advantage of the discussion over the lecture method in psychology 
classes. Milner® found a student preference for more structuring 
than occurs in nondirective classes. The series of experiments 
on nondirective classes have produced no definite evidence sup- 
porting such procedures. Carnes and Robinson® found no causal 
relationship between the amount of client talk and therapeutic 
gains. 

The results point to a possible error among those who believe 
in the value of participation: they may accept as a result of 
participation what may be largely the cause of it. In other 
words, high achievement and good adjustment may cause 
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participation rather than participation causing them. The 
former assumption is supported by the fact that the participants 
were a very superior group at the beginning of the course but 
that their participation had relatively little effect on their 
improvement. 

From this study it appears that the student who participates in 
a free discussion is selected by a process which is not related to the 
educational or therapeutic benefits he is likely to obtain. Like 
intercollegiate football, the free discussion appeared to select the 
most able participants, not those most likely to improve through 
their participation. Possibly the high participant should be 
viewed as a leader for the group rather than the sole beneficiary 
of his own participation. 

Although the present study did not follow a completely non- 
directive approach, the results seem to have implications for 
classes which are so conducted. It appears that it is the excep- 
tional student who enters a college class with the desire volun- 
tarily to participate. Most students seem to resist the suggestion 
that they participate because they expect to be criticized by other 
students if they do. Since it is an essential element of the non- 
directive approach that students participate in an uninhibited 
manner, overcoming such expectations would appear to be a diffi- 
cult obstacle to the nondirective teacher. 

In conclusion, the relationship found in this experiment 
between the amount a student participates in class and his educa- 
tional and therapeutic improvement closely parallels the findings 
of Carnes and Robinson® on the influence of the amount of client 
talk on therapeutic objectives. Substituting the words student 
and teacher in the appropriate places, their summary is repeated: 
‘‘The casual relationship between the amount of client (student) 
talk and desirable interview (teaching) outcomes is not clear. 
Therefore, it is not possible to use the amount of client (student) 
talk as a criterion of counseling (teaching) effectiveness.”’ 

In regard to future research with the discussion group, the gen- 
eral viewpoint of the authors coincides with that expressed by 
Bradford and French.‘ They state that ‘‘the complexity of the 
discussion method means that the educator who wishes to use 
this powerful tool can never succeed with a bag of simple tricks 
. . . The discussion leader must have a basic understanding of 
group dynamics and of individual change . . . but the science of 
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group dynamics is so young that only a very meager number of 
scientific facts and laws have been accumulated . . . Clearly we 
need more research on the dynamics of the discussion group and 
the techniques for stimulating group growth and development.”’ 


SUMMARY 


The purpose of this study was to determine the effects of volun- 
tary classroom participation on course improvement. The sub- 
jects were one hundred eighteen male students in four sections of 
a general psychology class. On the basis of participation records 
made during the course, the lowest thirty-five per cent of the 
students were classified as ‘nonparticipants,’ the highest twenty- 
seven per cent as ‘participants.’ 

The results give little support for the assertion that voluntary 
participation causes improvement during a course. When par- 
ticipants and nonparticipants were matched on scores made at 
the beginning of the course, the participants showed significantly 
greater improvement in only one out of fifteen variables. The 
data suggest that the participants may have improved slightly 
more than the nonparticipants in achievement but not in adjust- 
ment areas. The class as a whole showed significant improve- 
ments during the course in fourteen out of twenty-two compari- 
sons made. 

As a partial explanation of this essentially negative finding, the 
hypothesis is advanced that voluntary participation was non- 
conforming behavior. In support of this hypothesis, it is shown 
that the distribution of participation followed the J-curve char- 
acteristic of conformity behavior, that nonfraternity members 
participated significantly more often than fraternity members, 
and that the participants had those characteristics at the begin- 
ning of the course which would make it easier for them to be 
nonconforming. They were significantly more self-confident, 
ascendant, socially independent, socially aggressive, intelligent, 
critical, and informed. They were no more interested in the 
course than the nonparticipants. 

The implications of these results for nondirective teaching and 
further research on group behavior are discussed. 
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SOME SAMPLING PROBLEMS IN 
EDUCATIONAL RESEARCH* 


ELI 8S. MARKS 


Bureau of the Census 


In the past few years considerable progress has been made both 
in the theory and the practice of sampling as a research tool. 
Possibly the most important development in the field has been 
the clarification of concepts and the increasing realization that 
we can get meaningful answers from statistics only when we ask 
meaningful questions. 

I am sure that all of us have had the experience of reading a 
study which used very elaborate statistical techniques and wind- 
ing up completely dissatisfied with the conclusions. The 
mechanical application of analysis of variance, normalization of 
distributions, chi-square, factor analysis and the like, without a 
real examination of underlying concepts and assumptions has 
lead to a situation in which many sincere and intelligent research 
workers view all statistical analyses as sterile manipulations of 
meaningless numbers. This situation is not a fault of statistics. 
It is, rather, an effect of the substitution of techniques for 
thought, of a preoccupation with statistical means and an obscur- 
ing of statistical ends. While I am sure there are many problems 
in educational research which require and merit development of 
new sampling techniques, I should like to emphasize in this paper 
the need not for new techniques but for a more discerning appli- 
cation of old ones. 

All of the sampling techniques mentioned in this paper have 
been used successfully in fields other than educational research. 
The present paper is concerned with the feasibility of practical 
application of these techniques to educational research. An 
attempt has been made to outline the general nature of the 
applications but the scope of the paper does not permit the 
development of details. Modern sampling practice and theory 
involve using a combination of several sampling methods to 
secure greater efficiency. The development of an efficient sam- 
pling plan will usually require the services of a skilled sampling 
technician. For the research worker who wishes to develop his 
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own sampling designs, a bibliography is appended to this paper. 
The bibliography has been purposely restricted to relatively few 
references, the selection being limited to those references which 
are likely to be both useful and accessible to workers in the field 
of educational research. 

In designing a sample, a knowledge of sampling techniques can 
be very helpful and a good sampling statistician will know how to 
select the most efficient sampling design, but no sampling tech- 
nician and no book or article on sampling can do anything for 
you until you have determined the ends to be served by the 
sample design. 

DEFINING THE POPULATION 


The matter of definition of a population is fundamental and 
has, in many statistical studies, been given inadequate attention. 
All too frequently, research workers draw their samples and their 
conclusions from different populations. We start with a study 
of the ‘relation between birth order and feelings of insecurity’; 
secure fifty to one hundred subjects through a sympathetic friend 
who happens to be superintendent of schools in a conveniently 
located city and conclude that ‘feelings of insecurity are not 
correlated with birth order.’ I have no particular quarrel with 
such a conclusion—it may well be true for many populations— 
but it has nothing to do with the data collected. From a group 
of cases picked up without consideration of what population is 
represented, we can conclude nothing about any population other 
than the cases actually observed. 

It should be noted that a valid sample and statistically sound 
conclusions can be drawn from the school system to which our 
obliging friend, the superintendent, gives us access. A research 
finding does not have to apply to the whole population of the 
United States or the whole human race in order to be scientifically 
valuable. Our difficulties in drawing sound conclusions from 
samples stem from our own over-ambitiousness. We are not 
content with reporting our conclusions as applying to our own 
school or university or local community—we insist on discussing 
our results as if they apply to every human population. 

If our sample is restricted to a school, group of schools, a com- 
munity, a city or a state, our statistical conclusions must be 
similarly restricted. If we discuss the significance of a difference 
or a correlation coefficient or some other sample estimate, we 
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should bear in mind that the conclusion applies only to the popu- 
lation sampled. Speculations on applicability to other popula- 
tions are entirely proper and may be extremely valuable, but they 
should be labelled as speculations and not as statistical inferences. 


LISTING THE POPULATION 


The definition should be explicit enough to permit anyone to 
say with confidence: “‘ This object or person is in the defined popu- 
lation and this one is not.’”’ The next step in selecting a sample 
is to ‘list’ the population. Some types of sampling (e.g., simple 
random sampling) require a listing of each population element. 
For other types of sampling (cluster sampling) it is sufficient to 
list the population in groups or clusters and to list the individual 
elements only for those clusters actually selected for the sample. 
For example, drawing a simple random sample of the public 
school children of the United States enrolled on March 1, 1950, 
would require listing every child. To draw a cluster sample it 
would be sufficient to list every school and list children only for 
the schools selected for the sample. Whether we list individual 
elements or groups of elements, the list from which the sample is 
drawn must include the entire population. 


SIMPLE RANDOM SAMPLING 


Whether we draw a sample of elements or a sample of clusters, 
it is essential that the selection of the sample units be made by 
some process of known random character. The temptation to 
substitute personal judgment for a table of random numbers 
may be great, but yielding to it is dangerous if we hope to apply 
any form of probability inference to our results. Actually, the 
statistical techniques outlined in most textbooks are valid only 
for simple random sampling. For example, the familiar formula 
for the estimated standard error of a mean or a percentage apply 
only when all the observations are drawn independently and with 
equal probability, conditions usually satisfied only by simple 
random samples. 

While simple random sampling is a desirable technique for 
many purposes, such sampling is in most cases a luxury which 
few research workers can afford. It requires a complete listing 
of the individual population elements. Such a listing might be 
available for some school populations—might, for example, be 
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obtainable for a population of all teachers in a school system— 
but, in many cases, such listings can be obtained only at extremely 
high cost. Even where lists are available, simple random sam- 
pling may be very expensive (in terms of both money and effort) 
if data are to be collected by personal contact. If, for example, 
we plan to interview in their homes a sample of one hundred 
mothers with children attending the New York City schools, a 
simple random sample would probably be distributed so that no 
two mothers were within ten minutes walking distance of each 
other (except in Manhattan where you might find a couple of 
mothers only five or six blocks apart). The difficulty is, of 
course, the requirement that cases be drawn independently, a 
condition which gives a very small probability of selecting two 
mothers from the same block. 

The condition of equal probability of selection for each element 
of a simple random sample may also be a disadvantage for some 
purposes. Suppose, for example, that we wish to estimate the 
aggregate expenditure for home economics of public schools in 
the State of Illinois, and plan, for this purpose, to draw a sample 
of seventy-five schools. Simple random sampling would require 
that the probability of drawing a school with five thousand 
enrollment be the same as the probability of drawing one with 
one hundred twenty enrollment and be the same for high schools 
as for elementary schools. With asample of seventy-five schools, 
selection of only one large high school with a substantial program 
in home economics might give an estimate two or three times 
greater than the correct figure. 


CLUSTER SAMPLING 


For some purposes, simple random sampling will prove most 
satisfactory. For other purposes, it may be desirable to use 
cluster sampling and sampling with unequal probabilities. How- 
ever, when we use cluster sampling or sampling with unequal 
probabilities, we must make appropriate modifications of our 
variance formulae. 

While sampling of groups is usually easier and cheaper than 
sampling of individuals, the reduced cost per sample case is not 
all gain. For a given population, the variance between cluster 
samples will usually be larger than the variance between samples 
of individuals. A good example of this is the estimation of the 
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proportion of Negroes living in New York City. If we were to 
draw several samples of one thousand individuals at random from 
all persons living in New York, the proportion Negro in these 
samples would probably vary only slightly from sample to sample. 
If we were to draw five city blocks and in each block take two 
hundred persons (giving the same number, one thousand, in the 
sample), the proportion Negro in different samples might vary 
tremendously. If we happened to draw a block from one of the 
areas with high Negro concentration, our sample would be twenty 
per cent or more Negro. If we happened to draw no such block, 
we might have no Negroes in the sample. While most cases 
will be less extreme than this illustration, even a small tendency 
towards greater homogeneity within clusters than there is over 
the whole population, can have very substantial effects on the 
variance of sample estimates when the clusters are large. This 
fact must be recognized in estimating these variances. 

In deciding whether to use cluster sampling or some other 
sampling technique, relative costs and accuracy must be con- 
sidered. A cluster sample estimate may have three times the 
variance of a similar estimate from a simple random sample of the 
same size, but the cluster sample may cost only one-sixth as much. 
By tripling the size of the cluster sample we may be able to get 
the same accuracy as we can with simple random sampling at 
half the cost. 


SAMPLING WITH UNEQUAL PROBABILITIES 


Sampling with unequal probabilities also requires a modifica- 
tion of the familiar formulae for sampling errors. One method of 
using unequal probabilities is stratification. Here, the sampling 
units are divided into groups (or ‘strata’) and the sampling is 
done separately in each stratum, usually with different sampling 
probabilities in the various strata. Another method of using 
unequal probabilities is to assign the different probabilities 
directly to each sampling unit and draw the sample with these 
probabilities from the whole population. The use of unequal 
probabilities may require the weighting of individual observa- 
tions in preparing sample estimates, but this can frequently be 
avoided by drawing clusters with varying probabilities and sub- 
sampling the clusters in a manner which equalizes the probability 
of selection for individual elements. 
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Cutting ourselves loose from the strait jacket of simple random 
sampling does not mean that we go over to haphazard and uncon- 
trolled sampling methods. Cluster sampling and sampling with 
unequal probabilities preserve the principle of random selection 
(rather than selection based on judgment or convenience) and 
give sample results which conform to, and permit the application 
of probability theory. We must, of course, modify our theory 
to conform to the sampling techniques actually used. A suitable 
combination of probability sampling techniques can be used to 
give unbiased samples with measurable sampling errors and such 
estimates will be no more expensive than estimates from ‘judg- 
ment samples’ where the sampling errors and biases are not 


determinable. 


AN ILLUSTRATION—-SELECTING A NATIONAL SAMPLE 
OF SECONDARY-SCHOOL STUDENTS 


The use of a combination of methods can be illustrated by the 
situation in which we wish to develop norms for achievement tests 
to be used in secondary schools. We want to estimate the aver- 
age scores on these tests for all students attending secondary 
schools in the United States. Suppose that funds and other 
resources available for the work indicate that the sample must be 
restricted to some limited number of school systems and we have 
available a complete list of all secondary schools in the United 
States arranged by school system. To draw the sample we 
could: 

1) Group the school systems of the United States into strata. 
For maximum efficiency, the strata should contain approximately 
equal numbers of secondary-school students and should be as 
internally homogeneous as possible with respect to achievement 
of secondary-school students. The grouping into strata can be 
done on a judgment basis using any information which can be 
obtained regarding the probable achievement of secondary- 
school students in the system. We might also want to consider 
in our stratification the cost of giving tests in a school system 
and the size of the school system. Some of the strata may be 
established so that they contain only one school system. Since 
maximum sampling efficiency will call for drawing a single school 
system from each stratum, the number of strata should be equal 
to the number of school systems we wish to draw. 
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2) Within each stratum, assign to each school system a prob- 
ability of selection. Usually, we would assign larger probabilities 
to the larger school systems since the ‘weight’ of a school system 
in our population of secondary-school students will be propor- 
tional to its size. If data are available on enrollment of second- 
ary-school students, it might be desirable to make the probability 
of selecting a school system proportional to the number of 
secondary-students enrolled. There are various ways of assign- 
ing unequal probabilities of selection to different school systems. 
For example, suppose a stratum contained two school systems, 
A, with an enrollment of 5,000 secondary-school students and B 
with 10,000. We want to make the probability of selecting B 
twice that of selecting A. A technique for doing this is to select 
a five-digit random number from one of the standard tables of 
random numbers and: (a) select A, if the random number is 
between 00001 and 05000 (‘between’ as used here, includes also 
‘1’ and ‘5000’). (b) select B, if the random number is between 
05001 and 15,000. (c) draw another random number, if our 
first number is less than 1 (i.e., zero) or greater than 15,000, and 
continue until we get a number between 1 and 15,000. 

The random number technique can be used quite generally for 
assigning unequal probabilities (assigning to each school system 
to be drawn, a block of consecutive numbers equal to the enroll- 
ment in the school system). 

3) Use within each of the selected school systems, a similar 
technique for selecting schools, giving each school in the system 
a probability of selection proportional to the estimated number 
of secondary-school students enrolled. 

4) Finally, select secondary-school students at random within 
the selected schools using the school’s records of students 
enrolled. The proportion of students to be sampled in a given 
school can be so determined that every secondary-school student 
in the United States has the same chance of being included in the 
sample and it will then not be necessary to use weights in pre- 
paring the sample estimates. 

In determining the number of strata to be set up, the number 
of schools to be selected, and the number of students to be 
tested, considerations of cost and desired accuracy are para- 
mount. The sampling technician can tell you which sample 
design will give you lowest costs for a given level of accuracy or 
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will give you the least error for a fixed expenditure. However, 
you must first determine what your goals of cost and accuracy 
are. If you go to a competent sampling technician for advice, he 
will ask: 

1) Exactly what do you wish to estimate from the sample? 

2) What precision must these estimates have? Is it satis- 
factory to get an estimated norm within five per cent of the true 
national average or must it be within one-half or one-tenth 
per cent? 

3) How much are you willing to spend to achieve the accuracy 
desired? 


IMPORTANCE OF CLEAR STATEMENT OF PURPOSES 


Efficiency in sample design depends to some extent on how 
much you know about the population to be sampled; depends to 
an even greater extent on how much you know about your own 
purposes and how clearly you have defined these purposes. In 
some cases, a clear definition of purpose is the key to an appar- 
ently insoluble problem. Consider the problem of determining 
how well a particular college entrance examination does in pre- 
dicting which college applicants will succeed. It has been 
suggested that solution of this problem requires that the college 
admit an unselected sample of applicants so we may observe 
which ones succeed and which fail. Such a solution would, 
however, disrupt college admissions procedures. There are 
definitions of this problem which make its solution impracticable 
without drastic interference with normal admissions procedure. 
There is, however, a definition of the problem which admits of a 
practical solution; namely, expose the applicants both to the test 
and to the other selection procedures. Accept all applicants 
who are satisfactory by all criteria and reject all applicants who 
are unsatisfactory by all criteria. For these applicants, our test 
can be no better and no worse than the alternatives since all 
criteria give the same answer. Since we have eliminated the 
obviously unsatisfactory applicants, and accepted the obviously 
satisfactory ones, accepting from the middle group at random 
will probably not have any serious consequence. If a simple 
random selection from this group is undesirable, differential 
selection can be used, still preserving the principle of random 
(and unbiased) selection. The probabilities can be determined so 
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as to accept a minimum number of applicants from the ‘doubtful 
competence’ group but still to have a sufficient number to permit 
an unbiased comparison of the test with the alternative criteria. 


SAMPLING FOR TEST CONTENT 


The problem of preparing equivalent forms of the same test is 
also one which can profit from a re-examination of basic concepts. 
One method of approaching the problem is to consider that: 
(1) the items in all forms of a test form a ‘universe’ of items and 
(2) the purpose of a particular form of test is to yield estimates 
of the individual’s average or aggregate score on all items in the 
universe. 

With this approach different forms of a test become samples of 
items from a universe of items. If we use an identical sampling 
procedure in drawing the sample of items for each form, the 
forms of the test are automatically ‘equivalent.’ We replace 
the difficult problem of determining (after constructing the test 
forms) whether items were drawn from the same universe by the 
much simpler problem of drawing items so that you know they 
are from the same universe. The techniques used in drawing a 
sample of persons are equally applicable to sampling items and 
the problem of securing maximum test reliability becomes iden- 
tical with the problem of securing minimum sampling variance. 

The problem of getting equivalent forms of a test is part of the 
more general problem of sampling a ‘behavioral universe.’ In 
general, a test is used as a sample of an individual’s behavior or 
performance in some larger field. The problems of defining the 
behavioral universe and drawing a sample from it are similar to 
those of defining and sampling a universe of persons. Principles 
applicable to the problem of sampling persons are equally 
applicable to the sampling of performances. It can be seen that 
the most serious problem in sampling performances will be obtain- 
ing an inclusive listing of the universe. 


COMMON MISCONCEPTIONS ABOUT SAMPLING 


The preceding discussion has touched briefly on some points 
about which misconceptions are common. One of these miscon- 
ceptions is, of course, the application of the standard textbook 
treatment of simple random sampling to situations where the 
sampling is neither random nor simple; i.e., where groups are 
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sampled rather than individuals or the sample selection is not on 
a probability basis. 

Another area where misconceptions are common is ‘stratifi- 
cation.’ The most serious error in this field is the idea that 
stratification removes the need for random selection within strata. 
This is the fallacy which underlies the quota sampling technique 
used by many of the opinion polling agencies. The other mis- 
conception about stratification is emphasis on proportional 
sampling from the various strata. The emphasis in this paper 
has been on using stratification as a device for assigning unequal 
probabilities of selection to different population elements. 
There are dangers to assigning unequal probabilities: 

1) With unequal sampling probabilities the sample elements 
must be weighted by the reciprocal of the probability to obtain 
unbiased estimates. Failure to weight may result in very serious 
biases. 

2) By proper assignment of the sampling probabilities, it may 
be possible to make very substantial gains in the efficiency of a 
sample. On the other hand, a poor assignment of the sampling 
probabilities can result in very substantial losses in sampling 
efficiency. 

Thus, inequal sampling probabilities should be used with 
caution. Nevertheless, the advantages of a more flexible and 
more efficient sampling design should not be discarded because 
of the pitfalls resulting from careless use of the design. 

Another area of serious misconception is the field of bias in 
sampling. The very term ‘bias’ suggests that it is undesirable— 
and it is. There are, however, conditions where it is better to 
use a biased estimate rather than accept the even more undesir- 
able alternatives necessary to removing the bias. The criterion 
should be total error, which is composed of bias and variance. 
If avoiding a small bias means taking a very large variance, take 
the bias and keep the total error small. Most of our errors of 
judgment are, however, in the reverse direction; i.e., we strive 
for big samples (which mean, usually, small variances) and pay 
no attention at all to the sizes of the biases. The extremes of 
attending only to bias or attending only to variance are both 
undesirable. 

Approach to sample design must be flexible, and avoid rote use 
of techniques regardless of their applicability. We must take 
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into account the factors of the particular sampling situation, 
remembering that what is most efficient for one situation may be 
most inefficient for another situation. Above all, sample design 
must look to the purposes of the study and reject all solutions— 
no matter how ‘elegant’ they may be—which do not achieve 
those purposes. 
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RELATIVE CONTRIBUTIONS OF VOCABULARY 
AND AN INDEX OF INDUSTRIOUSNESS FOR 
ENGLISH TO ACHIEVEMENT 
IN ENGLISH* 


WILLIAM C. KRATHWOHL 


Institute for Psychological Services 


Illinois Institute of Technology 


It has always been assumed that work habits as well as ability 
influence the achievement that an individual accomplishes in sub- 
ject material, but the difficulty always has been to measure these 
intangible traits of work habits. Such a measurement is neces- 
sary if the relative contribution of ability and work habits to 
achievement is to be computed. 

A method to measure the work habits of industriousness and 
indolencet has been devised for mathematics by Krathwohl,?? 
and for English by the same author.‘ His device was the 
following: 

A) Give a group of individuals an aptitude test and an achieve- 
ment test in some subject. 

B) Compute the means and standard deviations of the scores 
for the group in each subject. 

C) Subject these scores to a linear transformation so that the 
means and standard deviations for each test are equal. In this 
study, derived scores were used which have a mean of 20 and a 
standard deviation of 4. 

D) Assume, other things being equal, that the score of an 
individual on the aptitude test should equal his score on the 
achievement test. 

E) Since other things usually are not equal, assume that the 
difference between an individual’s achievement score and his 
aptitude score is a measure of his industriousness. This measure 
is called an index of industriousness, or I.I. and is defined to be 





* Presented at the Annual Meeting of the Midwestern Psychological 
Association at Detroit, Michigan on May 6, 1950. 

+ For conciseness and also to avoid awkward construction, the word 
‘indolence,’ as employed in this investigation, is used, not in a derogatory 
sense, but rather as a substitute for ‘under-achievement.’ In the same 
way, the word ‘industriousness’ is used as a substitute for ‘over-achievement.’ 
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equal to the individual’s score on the achievement test minus his 
score on the aptitude test. 

In the case of English achievement, the experiment was con- 
ducted on three hundred eight sophomores at the Illinois Insti- 
tute of Technology who took an orientation test battery in 
September, 1946, which contained the vocabulary section of the 
Codéperative Reading Comprehension Test, Advanced Form, and 
the Codperative Mechanics of Expression Test. These three 
hundred eight sophomores also took, in May, 1948 (one year and 
eight months after taking the orientation test battery) the 
sophomore achievement test, English Expression, which was 
constructed by the Measurement and Guidance Project of the 
Educational Testing Service. The vocabulary test was used as 
an aptitude test for English achievement since it correlated 0.58 
both with Mechanics of Expression and the English Expression 
Test, neither of which contained a specific section on vocabulary. 
The Mechanics of Expression Test was used as the initial English 
achievement test for computing indexes of industriousness and 
the English Expression Test was used as the achievement test 
from which to determine if I.I.’s measure industriousness. It 
was found that, for the most part, work habits persisted over a 
period of at least one year and eight months, and that if English 
aptitude was kept constant, there were substantial correlations 
between achievement in English and English I.I.’s. 

The question now became—how much of the achievement in 
English was contributed by vocabulary, how much by work 
habits, measured by the indexes of industriousness, and how 
much still remained to be accounted for? The investigation 
was made by means of multiple correlation techniques and the 
results are shown in Table I. 

In Table I, it happens that all correlation coefficients having 
absolute values above 0.20 are significant at the 1 per cent level 
or better and all those with absolute values less than 0.20 are not 
significant. These values were obtained from Guilford.! 

The subscript 1 in the table refers to English achievement as 
measured by the English Expression Test, the subscript 2 refers 
to vocabulary, and subscript 3 refers to indexes of industriousness 
for English. Below the word ‘Group’ are the names of the 
groups being investigated. To make the discussion easier to 
follow, the columns are numbered. Column 1, headed by N, 
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TABLE I.—INTERCORRELATION AND MULTIPLE CORRELATION 
COEFFICIENTS AMONG (1) ENGLISH ACHIEVEMENT, (2) 
VOCABULARY, AND (3) INDEXES OF INDUSTRIOUSNESS 
FOR ENGLISH, AND THEIR RELATIVE CONTRIBUTIONS 


Column No. 1 2 3 4 5 6 7 8 
N T12 Tis T22 71.232 Vocabu- 55. K, 
lary 


Group 
All 308 .58 .06 —.47 .70 46 2 52 


Above average 94 .43 .48 —.02 .65 19 23 58 
Average 121 .41 .38 —.03 .56 17 15 68 
Below average 93 .18 .29 —.36 .44 6 13 81 


Industrious 83 .77 .19 .10 .78 58 2 40 
Normal 156 .66 —.08 —.34 .67 47 —1 55 
Indolent 69 .56 —.03 —.44 .61 37 1 63 


gives the frequency of the groups being investigated. Column 2, 
headed riz, gives the correlation coefficient between English 
achievement and vocabulary. Column 3, headed rj;, gives the 
correlation coefficient between English achievement and the I.I. 
for English. Column 4, headed r23, gives the correlation coeffi- 
cient between vocabulary and the I.I. for English. Column 5, 
headed r;.23, gives the multiple correlation coefficient between 
English achievement against vocabulary combined with the I.I. 
for English. Column 6, headed Vocabulary, gives the percentage 
of variance in English achievement accounted for by vocabulary. 
Column 7, headed I.I., gives the percentage of variance accounted 
for by English I.I.’s, and Column 8, headed Ko, gives the per- 
centage of variance still to be accounted for. 

An explanation is necessary for the method used to divide the 
original group into subgroups. The original group was first 
divided into the three subgroups of below average, average, and 
above average according to the derived scores of students in 
these groups on the vocabulary test which were 17 or less, 18 
through 22, and 23 or higher, respectively. Such a trichotomy 
was made because the theoretical frequency percentages for these 
groups were twenty-seven, forty-six and twenty-seven per cent, 
respectively, and this accorded roughly with the lowest quartile, 
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middle half and highest quartile, respectively, of a normal 
frequency distribution. 

The second division of the original group into indolent, normal 
and industrious groups was made accordingly as the index of 
industriousness for English for students in these groups was —3 
or less, from —2 through +2 and +3 or greater, respectively. 
Such a trichotomy was made because the theoretical frequency 
percentages for these three groups were twenty-five, fifty and 
twenty-five per cent respectively, and this accorded with the low- 
est fourth, the middle half and the highest fourth, respectively, 
of a normal frequency distribution. 

In order to get these theoretical percentages it is necessary to 
know the standard deviation of the I.I.’s. It can easily be 
shown by means of elementary statistics that if the aptitude test, 
the achievement test and the I.I.’s form a normal frequency dis- 
tribution and if derived scores are used, which have a mean of 20 
and a standard deviation of 4, that sigma for the I.I.’s equals 
4 +~/2(1 — r), where r is the correlation coefficient between the 
aptitude test and that achievement test which is used to compute 
the indexes of industriousness. In the present instance r = 0.58, 
so that sigma equals 3.67. Next, assume that the I.I.’s are a 
continuous instead of a discrete variable, then the lower limit for 
an I.I. of 3 is 2.5 and, hence, the industrious groups for an r of 
0.58 constitute twenty-five per cent of the entire frequency 
distribution. From this value of the frequency distribution 
percentage of the industrious group, it follows that the indolent 
group forms twenty-five per cent of the population and the normal 
group fifty per cent. 

From the first row of Table I, where all of the three hundred 
eight students are considered simultaneously, several conclusions 
can be drawn. The correlation coefficient r,3 of 0.06 in column 3 
between English achievement and English I.I.’s is practically 
zero and is not statistically significant, which indicates, as has 
also been shown in a previous investigation by Krathwohl,‘ that 
for the group as a whole English achievement is practically 
independent of work habits. This fact is further emphasized by 
the contribution toward the variance in English achievement by 
English I.I.’s in column 7, where this contribution amounts only 
to two per cent. The contribution from vocabulary of forty-six 
per cent leaves fifty-two per cent unaccounted for. However, 
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when both the influence of English I.I.’s as well as vocabulary on 
English achievement is taken into account, the zero order corre- 
lation coefficient ri. of 0.58 where only vocabulary is considered, 
rises to an 71.23 of 0.70 when English I.I.’s have been added. 
The negative correlation —0.47 in column 4 between vocabulary 
and English I.I.’s can be interpreted to the effect that the 
students with low vocabulary scores tend to be more industrious 
than those with high vocabulary scores. 

Previous work with indexes of industriousness has indicated 
that there might be wide variations in the various contributions 
when dealing with subgroups and this turns out to be true as is 
seen in the next six entries. 

When the entire group was divided into three subgroups—an 
above average, an average, and a below average group—on the 
basis of vocabulary scores, the correlation coefficients rj, in all 
three cases between English achievement and vocabulary in 
column 2 were lower than that for the group taken as a whole. 
In fact, for the below average group the correlation coefficient of 
0.18, in column 2, between English achievement and vocabulary, 
becomes low and is not significant. When, however, English 
I.I.’s are also taken into account, the multiple correlation coeffi- 
cient in column 5 for the below average group rises to 0.44 and is 
statistically significant at better than the one per cent level. 
Columns 6 and 7 for the below average group show that only 
six per cent of the variance in English achievement is due to 
vocabulary, but about twice as much or thirteen per cent is due to 
English I.I.’s. This makes only nineteen per cent of the variance 
accounted for by vocabulary and English I.I.’s, and leaves 
eighty-one per cent of the variance in column 8 still unaccounted 
for. 

Of what this eighty-one per cent consists is a matter of specu- 
lation until further investigations solve the problem. The fact 
that English achievement is an objective test, rules out the 
tendency of professors to upgrade their poor pupils in order to 
make their reports appear more favorable. Whatever the 
explanation is, certainly part of the unaccounted variance is due 
to the ceiling effect found with low ability groups which limits 
their achievement no matter how industrious they may be. 
Another part may be due to the effect of extra-curricular activi- 
ties in which low ability groups are prone to indulge to com- 
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pensate for their lack of ability. Still another part may be due 
to the course load which low ability students sometimes carry. 
Instead of taking fewer courses than normal or high ability 
students take, their course load frequently is higher because of 
the necessity to make up failures. Another factor in the high 
unaccounted for variance for English achievement may be a 
reading disability because difficulties in reading frequently are 
accompanied by vocabulary deficiencies. 

Column 4 shows that for the above average and average groups, 
the correlation coefficients between vocabulary and English I.I.’s 
are practically zero so that there is no tendency for students with 
above average and average vocabulary scores to be either 
industrious or indolent. On the other hand, the correlation 
coefficient in column 4 of the below average group, —0.36, which 
is significant at the one per cent level, indicates a relationship 
which should be investigated. It is possible that this statisti- 
cally significant negative correlation coefficient of —0.36 indicates 
a tendency of the below average students to compensate for low 
vocabulary scores by increased industriousness. It can be due 
to a screening effect, where only individuals from the low ability 
groups with high motivation remain and there still remains the 
possibility that it is sometimes an error to believe that lack of 
reward is accompanied by lack of effort. 

A comparison of columns 6 and 7 for the above average, aver- 
age and below average groups, shows that English I.I.’s play as 
important a réle if not more important than vocabulary in 
English achievement. 

When the students were divided into industrious, normal and 
indolent groups on the basis of their English I.I.’s of above 
average, average, or below average, respectively, column 2 shows 
that all the correlation coefficients 7:2. between English achieve- 
ment and vocabulary were higher than any of the correlations 
for the ability groupings. In fact, the rj. of 0.77 for the industri- 
ous group was even higher than the multiple correlation coeffi- 
cient 71.23 of 0.70 in row 1 column 5, where account was taken 
simultaneously of vocabulary and English I.I.’s. It should also 
be noted that the correlation coefficient between English achieve- 
ment and vocabulary is higher for the industrious group than 
for the normal group, and, in turn, that this coefficient for the 
normal group is higher than that for the indolent group. All 
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of these facts are in conformity with a study of Sappenfield’s® 
that accomplishments of students with high high-school effort 
indexes are more predictable than those with lower indexes. 

Practically all the effects of work habits are accounted for by 
the division of the entire group into industrious, normal and 
indolent groups as is seen from columns 6 and 7 for these three 
groups. Here English I.I.’s contribute practically nothing 
toward the variance in English achievement, whereas, nearly all 
the known variance comes from vocabulary. 

When the predictions for success in English achievement are 
compared for ability groupings with work habits groupings, the 
following two conclusions can be drawn: First, the most accurate 
predictions of the effects of work habits can be made, as is seen 
from column 3, when the group is subdivided into above average, 
average and below average groups on the basis of vocabulary. 
Secondly, the most accurate prediction for success in English, 
in any case, is made, as is seen in column 2 when the entire group 
is subdivided into industrious, normal and indolent groups, and 
the prediction for success in English is made separately for each 
group on the basis of vocabulary. 

The results of this investigation show that under some condi- 
tions work habits are just as important for success in English 
as is vocabulary. Sometimes, they are even more important. 
Hence, in a counseling situation, a student should be told not only 
what his vocabulary score is and its implications, but also what 
is the value of his index of industriousness for English. Then 
the proper advice can and should be given. 


SUMMARY 


1) The correlation coefficient of 0.58 between English achieve- 
ment and vocabulary can be raised to 0.70 if account is also taken 
of the work habits of industriousness for English. 

2) When a group of students is divided into above average, 
average and below average groups on the basis of vocabulary 
scores, the variance contributed to English achievement by work 
habits is as great or greater than that contributed by vocabulary. 

3) When a group of students is divided into industrious, normal 
and indolent groups, practically all the variance in English 
achievement is contributed by vocabulary. 

4) The best predictions for English achievement are made 
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when the entire group is divided into industrious, normal and 
indolent groups on the basis of indexes of industriousness for 
English and when such predictions are made separately for each 
of these groups. 

5) Of the three groups, industrious, normal and indolent, the 
best prediction for success in English achievement is made by the 
industrious group on the basis of vocabulary. 

6) The poorest prediction for success in English achievement 
as measured by vocabulary is for the below average group in 
vocabulary where the variance unaccounted for by vocabulary 
and indexes of industriousness for English amounts to eighty-one 
per cent. It is not known of what this eighty-one per cent 
consists. 

7) There is a tendency in the normal and indolent groups for 
students with low vocabulary scores to achieve more in relation 
to their ability than students in these groups with higher vocabu- 
lary scores. 

8) Vocabulary accounts for the greater part of variance in 
achievement in English for the industrious group than for any 
other group. 

9) The index of industriousness accounts for the greater part 
of variance in achievement in English for the above average 
group in vocabulary than for any other group. 

10) Finally, in a counseling situation, indexes of industrious- 
ness for English should be used, as well as vocabulary scores. 
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TEEN-AGERS’ ATTITUDES TOWARD PROBLEMS 
OF CHILD MANAGEMENT 


H. H. REMMERS AND A. J. DRUCKER 


Purdue University 


A good many years ago the German humorist, Wilhelm Busch, 
wrote: ‘‘ Vater werden ist nicht schwer; Vater sein dagegen sehr!”’ 
Translated: “‘It is easy to become a father, but very difficult to be 
one.”’ 

Prediction in human affairs is difficult and hazardous, but if 
anything can be predicted about the near future of present 
high-school pupils it is that the major proportion of them will 
within a few years be parents and hence be faced with the prob- 
lems of child-rearing. To assess teen-agers’ present understand- 
ing of their future réles as parents, and indirectly thereby to infer 
parental understanding as well, the ninety items of the Form A 
plus Form B of the Stedman-Remmers* scale were given in 
November, 1949, to a national sample of 15,000 high-school pupils 
of the Purdue Opinion Panel as its Poll No. 24. The Panel is 
conducted by our Division of Educational Reference at Purdue 
University, Lafayette, Indiana, usually thrice yearly among pupils 
in nearly every state of the Union. 

Stedman has shown her scale, Attitudes Toward Child Behavior, t 
validated in terms of experts’ judgments, to be a reliable instru- 
ment to use in measuring the insight of mothers and other women 
into the behavior of children. 

Traditionally the replies to questions on any of our polls are 
analyzed and interpreted for schools one item at a time. Since 
Poll 24 consisted of a standardized test, by simple counts it was 
possible to report that, in a nationally representative sample, 
girls gave the correct answer in significantly greater proportions 
than did boys on sixty-two of the ninety items; that boys gave 
the correct answers more frequently than did the girls on four 
items; and that on fifty-five of the ninety items, the correct 





*Stedman, Louise A. An Investigation of Knowledge of and Attitudes 
Toward Child Behavior, Purdue University Studies in Higher Education, tx, 
Ph.D. Thesis, 1948. 

t To}be published as How to Bring Up a Child by Science Research 
Associates. 


105 











106 The Journal of Educational Psychology 


answer tended to be given more frequently by ecleventh- and 
twelfth-graders than by ninth- and tenth-graders. In addition, a 
tendency could also be noted for pupils whose parents had 
attended college to give correct answers to several items more 
frequently than did pupils whose parents had not gone beyond 
high school.* 

Now the items used on Poll 24 were, after all, items of a test or 
scale designed to measure the extent of an individual’s attitudes 
toward or knowledge of child management (the distinction 
between attitudes and knowledge not always being clear here). 
With an N of 1600 teen-agers, the equivalent forms reliability was 
.85 and as a check the Kuder-Richardson Case III reliability was 
found to be .89 for the same sample. For analysis of responses to 
the items as a test a sample of 1132 cases was selected from the 
15,000 total sample to conform to known national teen-age char- 
acteristics. The sample was stratified with respect to high-school 
grade, geographical region (East, Midwest, South and Mt. 
Pacific), religion (Protestant, Catholic, Other or None) and level 
of parents’ education. It was randomized with respect to sex, 
size of community of residence and home environment as meas- 
ured by our seven-item House and Home Scale incorporated 
within the question sheet. Counts on the three randomized 
factors revealed proportions of pupils very close to known 
national characteristics. 

Table 1 shows the known characteristics of the sample and the 
means, standard deviations and significance of differences con- 
cerning the various groups comprising the sample of 1132 cases. 

How far may we go in interpreting these results? On the 
assumption that the sample is actually representative of the 
national teen-age population in each of the eight characteristics 
named, i.e., sex, high-school grade, religious preference, geo- 
graphical region, size of community of residence, home environ- 
ment, education of mother, and education of father, the data of 
Table 1 give an estimate of the average child management 
attitudes of the total. national teen-age group of United States 
boys, of United States girls, of Protestant teen-agers of the coun- 





* Remmers, H. H. and Drucker, A. J. ‘‘How high-school youth believe 
children should be brought up.” The Purdue Opinion Panel, Report No. 24 
Mimeographed, Purdue University, Division of Educational Reference, 
December, 1949. 
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TABLE 1.—MEANS, STANDARD DEVIATIONS AND CRITICAL RATIOS 
oF 1132 Hicu-ScHoo.t Pupits’ ScoRES ON STEDMAN’S 
Attitudes Toward Child Behavior 


Group 
Total 


Boys 
Girls 


Ninth- and Tenth-Grade 
Eleventh- and Twelfth-Grade 


Catholic 
Protestant 
Other or None 


Rural 
Urban 


East 
Midwest 
South 

Mt. Pacific 


Home Environment I 
Home Environment II 


Mothers’ Education I (did 
not complete high school) 

Mothers’ Education II 
(completed high school 
or attended college) 


Fathers’ Education I (did 
not complete high school) 

Fathers’ Education II (com- 
pleted high school or at- 
tended college) 


N 
1132 


548 
584 


558 
574 


260 
680 
192 


561 
571 


242 
431 
285 
174 
721 
411 
716 


416 


718 


Mean 


55. 


52. 
58. 


53. 
57. 


54. 
56. 
52. 


54. 
56. 


55. 
55. 
. 56 


55 


56. 


54. 


56. 


54. 


56 


77 
18 


46 
60 


69 
68 
78 


89 
22 


01 
53 


37 


.00 
57 


81 


84 


79 


414 56.89 


12 


12 


11. 


11 
11 


12. 


11 
11 


12 
11 


11 


12. 
12. 
13. 


11 
11 


11 


12. 


11 


12 


g 


.03 


.25 
16 


.92 
77 


44 
.76 
.85 


.23 
47 


41 
20 
95 
23 
.84 
91 
.94 
05 


.98 


.00 


* Significant at the one per cent level or better. 


F. Ratio 


(F = 8.84)* 


(F = .63) 


Critical 
Ratio 


7.76* 


5.87* 


2.27 
4.03* 


1.87 


3.50* 


2.74* 


2.84* 
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try, etc., each considered, for sampling convenience, as independ- 
ent factors. Thus the number of teen-dgers in any multi-classifi- 
cation group of our sample, say, Southern rural Catholic girls, if 
any, would depend fundamentally upon the composition of the 
total sample participating in the Panel and upon our methods of 
stratification and randomization. We do not attempt to employ 
actual national proportions for any breakdowns involving more 
than one factor. 

In the national population of teen-agers several of these eight 
factors are probably intercorrelated—for example, home environ- 
ment and parents’ education, possibly religion and rural-urban 
status, rural-urban status and geographical region, and other 
combinations which may not be readily apparent. From Table 1 
we may infer what the child management attitudes of teen-agers 
are, dividing the subjects on one factor at a time and ignoring the 
intercorrelations among these factors. By using a factorially 
designed sample wherein the frequencies of these factors are 
made equal in all combinations, we can infer these attitudes 
with much more rigid controls over possible intercorrelations 
among the factors. 

The factorial design involves equal numbers of cases for each 
possible combination of five of these factors when four of them 
are treated dichotomously and one, grade placement, is divided 
into four parts. The five factors selected, based on significant 
differences of Table 1, were sex, grade in high school, mothers’ 
education, fathers’ education and home environment. Such a 
factorial arrangement necessitated sixty-four basic cells, in each 
of which were twenty-five cases, making a total N in this sample of 
1600. 

The principal advantage of the factorial design over the tradi- 
tional experiment, where one factor may be examined at a time, 
lies in its greater efficiency and comprehensiveness. Besides 
measuring the effects of each of the single factors, the effects of 
the interactions of all combinations of factors can also be examined, 
in each case the other main effects and/or interactions being 
equalized or held constant. 

From Table 1 we discovered, for example, that pupils whose 
fathers had completed high school made better scores than pupils 
whose fathers had not completed high school. We also learned 
that pupils classified Home Environment I did not do as well as 





Teen-agers and Problems of Child Management 109 


those classified Home Environment II. What we do not know 
from this table is what part of this superiority of pupils with 
the higher-educated fathers actually came about through mem- 
bership in the Home Environment II group. This is only one of 
many such questions to which we sought answers in the factorial 
design. 

Table 2 shows the main results of the analysis of variance. All 
interactions except the first-order interaction involving sex and 
grade were not significant when tested by the within-groups 
variance. Accordingly, all interactions of all orders, except this 
one were combined with within groups to yield a combined 
residual error estimate. This sex x grade interaction was 
significant at the five per cent level when tested against this 
residual and so was taken as the error estimate against which to 
test the main effects of sex and grade. The residual variance was 
used, however, as the appropriate error term against which to 
test the other three main effects of mothers’ education, fathers’ 
education and home environment. 

Concerning the significant main effect, sex, girls’ scores are 
significantly higher than boys’ (Table 3); moreover, this superi- 
ority exists independent of the influences of grade, mothers’ 
education and home environment and over and above the influ- 
ence of the significant interaction involving sex and grade. 
Table 3 helps interpret the grade differences. The higher the 
grade the more superior the score, all other factors present in the 
investigation being held constant. The factors of mothers’ and 
fathers’ education, when each is treated in strict independence 
from the other four factors, also contain differences that are 
significant over and above the other factors we have introduced 
into this design. That is, the higher groups of students in terms 
of either parents’ education are slightly superior, sex, grade and 
home environment being equal. On the other hand, the superi- 
ority of pupils of Home Environment II in general (a difference 
found highly significant in the single factor comparison for the 
nationally representative sample) probably hinges upon the 
interrelationships of this classification with other classifications, 
notably parental education, since it loses this significance in the 
factorial design. The significance of the interaction between sex 
and grade in all likelihood comes about through the fact that 
although girls are superior to boys in general on the criterion 
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variable, at the twelfth-grade level they have become much more 
superior than at the lower grade levels. 

In brief summary, then, results of this factorial design clarify 
the child management attitudes of the country’s teen-agers as 
follows: 

1) When social, economic and cultural home influences that 
may be reflected by parental education and a home environment 
scale are controlled, girls in high school are better than boys on 
the Stedman-Remmers test to measure attitudes toward and 
knowledge of child behavior, especially at the higher levels in 
school. This may be due to the faster maturation of teen-age 
girls, increased interest in child management problems, more 
direct experience with the management of children, more specific 
and relevant curricular content for girls, or, of course, all four of 
these reasons. 

2) Eleventh- and twelfth-grade boys and girls are better child 
managers, in terms of scores on the Stedman scale, irrespective of 
parental education and home environment, than ninth- and tenth- 
grade boys and girls. Clearly, the criterion variable is positively 
correlated with age, maturation and educational influences. 

3) Completion of high school or attendance at college by either 
parent is a good indication that the children will hold what experts 
consider the more acceptable attitudes and beliefs concerning 
child management affairs. Concerning the influence of home 
environment, the holding of such beliefs and attitudes is appar- 
ently more dependent upon the interrelationships among this and 
other influences. 

What are the implications of these differences attributable to 
parental education? First, we may recall that Stedman found 
the amount and type of education of women positively correlated 
with scores on the scale. Here we have found that the amount of 
education not only of the mother but of the father is also posi- 
tively correlated with scores made by the children, lending addi- 
tional support to the hypothesis that a definite relationship exists 
between parental attitudes and behavior and their children’s 
attitudes and behavior. The influencing of children by parental 
attitudes and example is generally thought to be so strong as to 
account almost totally for the individual’s basic personality 
pattern. 

A few comments now about the significant sex and grade dif- 
ferences found in this study: A group of P. T. A. women, after 
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attending the first of a series of lectures on child training, decided 
it was ‘old hat’ to them. They further declared that fathers in 
general and not mothers really need this training. The problem 
was how to get these fathers trained in child management. 
Although Stedman did not attempt to show that adult females 
knew more of the correct answers on her scale than did males, here 
we are shown that this sex difference unquestionably exists 
already at the high-school level—well before marriage and parent- 
hood take prominence in the plans of most young people. More- 
over, this difference appears quite independently of certain 
cultural influences such as the level of parental education and the 
quality of home environment. What are you willing to bet that 
this sex difference will not continue to increase after marriage? 
Everyday observation seems to show that most males remain 
aloof in matters of child training, or their participation consists 
largely of doling out the physical punitive measures that they 
think are called for, or of pampering the offspring. This is at 
least in part because they have been encouraged into this réle 
first by parents and then by wives. Men’s magazines and the 
columns men read in the news do not deal very frequently or 
extensively with child management problems, nor, of course, do 
the science, business, engineering and trade courses that adult 
males enroll in. Women inform themselves, for better or worse, 
through readings in Ladies’ Home Journal, Parents’ Magazine, 
McCall’s and Woman’s Home Companion, probably often have 
functionally relevant curricula in high school, and enroll, by and 
large in home economics, psychology and sociology courses after 
high school. 

A few high schools actually include in their curricula family 
living courses for boys. It appears quite likely that the high 
school, which daily promises to offer more and more terminal 
education for the bulk of its students who do not go on to college, 
is the place where the proper child training attitudes and knowl- 
edge should be encouraged. The high school is the place for such 
training, because it is the last formal educating agency encoun- 
tered by the majority of young people. The high-school years 
are appropriate for such training for we have been able to show 
that much of the variation in these teen-agers’ attitudes toward 
child management is associated with educational level in the high 
school and is perhaps attributable to individual maturation, to 
courses already given in the high school, or both. 














THE DIFFERENTIAL APTITUDE TESTS AS 
PREDICTORS IN ENGINEERING 
TRAINING 


RALPH F. BERDIE 
Student Counseling Bureau 


University of Minnesota 


During the past two decades, correlation coefficients predicting 
success in colleges of engineering have varied about .65, occasion- 
ally reaching the lower .70’s and sometimes dipping into the 
.50’s. When compared to the effectiveness of predictions made 
in other academic fields, these engineering predictions are rela- 
tively accurate, but somewhat disturbing is the fact that predic- 
tion coefficients resulting from current studies have been no 
higher than, and in some cases not as high as, those reported 
ten or fifteen years ago.*:*:567 This may be due not to the tests 
but rather to limitations imposed by the criteria. 

In an attempt to increase the efficiency of prediction and in 
order to determine the most economical and practical, and the 
least time consuming method, a series of engineering prediction 
studies have been conducted at the University of Minnesota, the 
first one being reported in 1927. The results of these early 
studies have been reported in numerous papers!**8 and they 
provide a background for the studies conducted during the past 
two years. 

In the fall of 1947, the freshmen entering the Institute of 
Technology were given a battery of tests consisting mainly of 
achievement tests in mathematics, chemistry and the natural 
sciences. The results obtained with that battery are reported 
in a recent paper.? To summarize those results, the multiple 
correlation coefficients predicting first-quarter grades for three 
groups of students divided according to major subject, were .79, 
.66, and .66 and the coefficient predicting first-year grades for the 
entire group of three hundred seventy-two students was .63. 
The best predictors of those used were high school percentile 
rank, G. E. D. Test III, and the Coédperative Mathematics Test. 

These results also confirmed a fact previously known. The 
achievement test scores of students depended in part upon the 


courses they had taken in high school. Those students, for 
114 
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instance, who had taken chemistry in high school obtained a 
mean score of 31 on the Codperative Chemistry Test, with a 
standard deviation of 11, while those students who had not 
taken chemistry in high school had a mean score of 14 with a 
standard deviation of 7. The same sort of relationship was 
found in mathematics. Meanwhile the two groups were much 
more equal on the basis of measures not directly related to high- 
school training. 

To find a battery of tests providing scores less dependent upon 
previous training than those we had been using, the available 
aptitude tests were surveyed and the Differential Aptitude Tests 
were selected for experimentation. First, it was wished to 
determine how well those tests would predict first-quarter and 
first-year grades of engineering students. Secondly, it was 
wished to determine if these scores would be relatively inde- 
pendent of courses taken in high school. Then the question was 
asked whether the difficulty level of the tests was appropriate 
for people at this educational level and whether the low test 
intercorrelations found in the standardization group of high- 
school students would also be found for college freshmen. 


PROCEDURE 


The Differential Aptitude Tests Form A were given to four 
hundred seventy-two freshmen entering the Institute of Tech- 
nology of the University of Minnesota in September, 1948. The 
tests were administered during the late summer and early fall 
prior to the beginning of fall-quarter classes. The tests provide 
eight scores titled: Verbal Reasoning, Numerical Ability, 
Abstract Reasoning, Space Relations, Mechanical Reasoning, 
Language Usage I Spelling, Language Usage II Sentences, and 
Clerical Speed and Accuracy. For three hundred eighty students 
these scores were available along with high-school percentile 
rank, scores on the American Council on Education Psychological 
Examination, 1937 form, and the Codperative English Test, 
Form OM. For these students were available also sufficient 
academic grades to allow an honor point ratio to be computed. 
During the analysis of the data, the scores on the A. C. E. and 
Coéperative English Test were found contributing nothing to the 
prediction of grades, and this group of three hundred eighty sub- 
jects was supplemented by an additional group of fifty-four 





OFE'l — (USH) #2280" = UdH 
USH: [PA9] uae Jed [ 48 JUBOYIUZIS yYySIOM BJOG 


Sa[qvliva Zuryoipeid [] - YdH 


tL = u 

S& &L Buq dgop 
3 SI° 08" | EV 4910 
S 0s 6S" 3 II 89 Suey 
a 8h 99° IZ I 89 suey 
3 12 60'- 0° Of 6I'- svoy Yoo 
§ 8h 6a 7 a) Sn > A Sh Pu yedg 
3 re 1a 9° 24 so se 2g" svay 148qV 
Sse of +0 & 18 08 68° ~ 88 nqy wnyy 
2 09° 829° cl’ 98 8 €&° Gh 98° ~— Be SBOY G12, 
c Ib a ie & 2 490° «Of 8st ge gE usH 
F 1 na cl’ 22) 060 88—i(iCzdCO SCOT ti“‘ié‘éHNC(’#S‘SATCB'' séGSQD’ UdH 
S I€'%Z I9'6E LP'Il OF'ZI YELL IPL 98ST SBF FIF L169 SO'LT FB as 
g 9066 FS°EST 06'S ZFS ESLL 18S ShhL 986E HEHE YIIH F208 GT uve 
Ny 


GOV ONG VO IAT IAT AN as av VN uA USH UdH 
6¢ = N (sioleyy Zutss0ursugq peormeyy pus A1ystuleyD) [ dnory 
(SF61 ‘Adojouyoay, Jo o4Nn4IysUT ByOsouUIT JO AZISIOAIUL) ) 
SdNOUY) ATU], GHL Ao 
HOV] WOd SOILVY LINIOG YONOY{s UALUVALKH) TIVY ANV SaUOOG LSAT, NAAMLAG SNOILVIGUAOOUTLINI—'| Fav], 


116 






‘ 
a 


117 


6900 Z— (AqyIqe sequinu) F1gEg90' + (HUSH) 090Z10° = UdH 
AzIGs’ Jequinu pu’ YSH - YdH 


co = ud 
£ APIqs’ Jequinu pus YSH: [eae] yueo Jed [ 4e JUBOyIUZIS syysZIeom Bog 
= sojqvlivaA Zuryotpeid [| - YdH 
Ly = a 

= OL Zuq doog 
S 9% (8 Eq¥ 4219 
s rs" TL’ 61° II 8Q Zuey 
So $9 st 8g 189 Suey 
~, 86° 02° GO°'—- 96° IT’ SBoy Wool 
_ & wn go «6 &t’—80"——«é8ET™ Py yedg 
Ss 68 02° 61° 6T° IZ Sh seoy Iysqy 
= ef = 88" c eb 68 #18 # «9 ~~ «618° rqy wny 
Fs 6¢° 6¢° 5 v¢° GP 9¢° 0¢° aa OF" SBVOY GI9A 
o ov cs" 92° ‘4 OF" — = 02° 1g" ct" usH 
: GV" 9¢° ST" Ie" 1Z° LT’ Lo 6 om 6° Iv" ¥¢° udH 
B 
~ LL2% 89°68 *t16 CGPIt £602 [29 89'S FL LI TES TL St GL’ ds 

vI°26 ZO LOT 8S Fo FO TS 12°69 99°29 F2'8L O02 OF 86'EE IL 68 LGPL FT uve 


GOV OND VO IAT INT UN US UV WN UA USH Udo 
0S = N (AsqstureyD jooyos-y3ry yy1M suofepy 10430) IT dnoiy 






08z$° — (HSH)IEZIZO’ = UdH 
YSH : [eA9] Jue0 Jed [ 4B JUBOYIUZIS YYSZIOM BJOG 


= so[qviiea Suryoipaid TT - YdH 
3 OL = u 
S 69° Zuq do0p 
a tO 8 +0 EV 2°19 
= 8f GL 0° II 89 Suey 
S 6h Lg €0° LG I 8) Suey] 
8 2 @ 007 16 = $ svoy Yoo 
S OF 98° (A \ i ) Oe 2 Oe |B Pu yedg 
Q eh 62" | > Ss - A Oe ae svoy 1SqV 
S ° = Og" 10 498 86 =©=6LeCOST VT rqy wnN 
3 2 8g" co.0OCG a ECE LC SBoYy GIA 
& eo tg rm »80—si«O‘oos’=—(ast—i—“(itiTCdTSSS:'“ (“sis CséCT usH 
SS 99° = 9g 60 ¢F 8 ZG 86 £2 GF 88 09° udH 
.*) 
S 19°22 6h OF 166 FOS GEIZ 869 Chel ISh IS €2S FLO IL’ as 
16°S8 ZP'LST 8o°SS 26°0S 12°99 89S €0'8L OL'8E TOS 8L°9E 66°EL 80'T uve 


GOV ONH VO WAT INT UN us av VN YA USH Ud 
IZ = N (AajstureyyD jooyos-yZ1y ynoyyM siofeyy 10430) [TJ dnory 
(panuyuo)j)— | AIAV I, 


118 












Tests as Predictors of Engineering Training 119 


students for whom complete data were available with the excep- 
tion of scores on the two tests which were discarded. This group 
of four hundred thirty-four students constitutes that group used 
to obtain predictions of first-year grades, as opposed to the 
prediction of first-quarter grades. 

The group of three hundred eighty students was divided into 
three separate groups. All engineering freshmen take the same 
courses with the exception of chemical engineers and chemistry 
majors who take a special chemistry course. As these two 
chemistry groups have been found to obtain higher test scores 
and grades than the other engineering students, and insofar as 
their curriculum differed, the data for them were analyzed 
separately. This group will be called Group I. Among the 
remaining students, consistent differences have been found as 
we have already noted between those who have had chemistry 
in high school and those who have not had chemistry, so the data 
for these two groups were analyzed separately. The group of 
students not majoring in chemical engineering or chemistry but 
who have had chemistry in high school will be called Group II 
while the non-chemists who have not had chemistry will be called 
Group III. 

At the end of the first quarter, for each of the three groups 
the intercorrelations between predicting variables and _first- 
quarter grades were determined. Multiple regression equations, 
predicting grades from the other eleven variables, were computed 
and the significant Beta weights were determined. Then, using 
only those variables for which the Beta weights were significant 
at the one per cent level of probability, new regression equations 
were computed. 

At the end of the first year, those predicting variables which 
had been found most promising in the preceding analysis were 
selected and for the group as a whole, the multiple regression 
equation was computed. For this entire group, specific grades 
in mathematics, chemistry, and drawing were obtained and the 
correlations between test scores and these grades determined. 


RESULTS 


The correlations between first quarter honor point ratio and 
predicting variables for the three groups are presented in Table 1. 
Using all eleven predicting variables, the multiple R’s are, for 
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the chemical engineers and chemists .74, for Group II, .67, and 
for Group III, .70. Of the eleven beta coefficients, only one was 
significant at the one per cent level for Group I, two significant 
in Group II, and one significant in Group III. Using only the 
variables for which beta coefficients were significantly different 
from zero, the obtained prediction coefficients were .65, .63 and 
.60, respectively. Corresponding multiple R’s obtained for three 
similar groups the preceding year, using more achievement tests, 
were .74, .64, and .56. 

In the prediction of first-quarter grades, for only Group II 
did any one of the scores derived from the Differential Aptitude 
Tests contribute significantly, and that was the Numerical Ability 
Test. High-school percentile rank consistently had the highest 
correlation with grades and Numerical Ability, Codperative 
English, and A.C.E. tended to follow in that order. 

In Table 2 are presented the correlations between first-year 
honor point ratio and selected predicting variables for all students 
grouped together regardless of major curriculum or previous 
courses. The two most promising predictors, high-school per- 
centile rank and Numerical Ability score, yielded a multiple 
correlation of .62 with honor point ratio, both variables contribut- 
ing significantly to the prediction. The addition of the Language 
Usage score did not significantly add to the effectiveness of pre- 


TABLE 2.—INTERCORRELATIONS BETWEEN TEST SCORES AND 
FirstT-YEAR Honor Point Ratios 


N = 434 
HPR HSR Num Abil Lang Usage I 
Mean 1.13 75.76 33.69 66.78 
SD .69 18.74 4.28 20.77 
HPR 55 43 .25 
HSR 31 .37 
Number Ability 41 
R = .62 
HPR - 3 predicting variables 

R = .62 


HPR - HSR and number ability 
Beta weights significant at 1 per cent level: HSR, Num Ability 
HPR = .017183 (HSR) + .046198 (number ability raw score) 
—1.731 
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diction. This multiple R of .62 can be compared to the same 
one of .62 obtained the previous year between first-year honor 
point ratio and high-school percentile rank, G. E. D. Test III 
and Codéperative Mathematics Test. 

The regression equations for predicting first-quarter grades 
in each of the three groups and for predicting first-year grades 
for the total group are included in Tables 1 and 2. 

Some evidence has also been found that the scores of the 
Differential Aptitude Tests are not as closely related to the 
courses a student has taken in high school as were the scores of 
the previously used achievement tests. Insofar as the students 
who select different courses in high school have different abilities 
as shown by some tests, here, for instance the A.C.E., the effects 
of courses upon test scores cannot be completely determined in an 
experiment such as this. 

Examination of Tables 1 and 2 will reveal the restricted ranges 
of scores obtained by these college freshmen on the Differential 
Aptitude Tests. For instance, on the Numerical Ability Test, 
the best predictor, the mean score for the total group was 33.69 
and the standard deviation 4.28. The highest possible score is 
40. In the norm manual, the mean score on this section for 
twelfth-graders is presented as 23.6 with a SD of 9.4. 

On this test, and on others, the distributions of scores were 
quite skewed and showed a marked kurtosis. The mean scores 
were so high in some cases that they suggest the test is not 
sufficiently difficult for use with this population. 

The three sets of intercorrelations of scores on the aptitude 
tests indicate their degree of independence. In Group I for the 
28 coefficients, the median was .28 and the highest coefficient, 
.59, the lowest .04. In Group II, the median intercorrelation 
coefficient was .25, and the highest coefficient, .58, the lowest, 
.05. The figures for Group III are almost identical. The inter- 
correlations between tests as presented by the test authors and 
as obtained in the standardization procedure tend to be about 
the same and in some cases higher, with the median coefficient 
being about .45 and the highest and lowest ones being .67 and 
.06, respectively. The two Language Usage Tests are relatively 
closely related to each other as are the Language Usage Tests 
and the Verbal Reasoning Test. In general, however, the rela- 
tively poor job that this battery does in predicting engineering 
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training success is perhaps not due to the failure of the tests to 
measure discrete factors, but rather to the inappropriate difficulty 
level and range of the tests. 

The test most closely related to grades in mathematics was 
the Numerical Ability Test, with a correlation coefficient of .45. 
In the preceding study, the correlation found between mathe- 
matics grades and the Coédperative Mathematics Test was .43. 
The Numerical Ability Test was also the test most closely related 
to grades in chemistry, with a correlation coefficient of .35. The 
Space Relations Test was the one most closely related to grades 
in drawing, with a correlation of .38. The correlation coefficients 
between specific tests and grades in specific courses, although 
low, are in the expected direction. They tend to be somewhat 
lower for these college freshmen than those coefficients for high- 
school students presented by the test authors in the manual. 


CONCLUSION 


In conclusion, the results of the analysis indicate that the 
Differential Aptitude Tests, with the exception cf the Numerical 
Ability Test, do not contribute significantly to the prediction of 
academic success in the Institute of Technology at Minnesota. 
The tests, which appear to differentiate adequately between 
twelfth-graders, for whom they were apparently designed, are 
not of sufficient difficulty for use with these engineering students. 
A version of the Numerical Ability Test which is more difficult 
than the present forms may prove to be of value for this purpose. 
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BOOK REVIEWS 


Lee J. CronpacH. Essentials of Psychological Testing. New 
York: Harper and Brothers, 1949, pp. 475. 


In Essentials of Psychological Testing, Cronbach has set himself 
the task of writing a textbook for the beginning measurement 
course, in which emphasis is on basic concepts and principles of 
testing are illuminated by intensive study of certain selected 
tests. He wisely refrains from attempting to introduce the 
student to all, or even a great number, of the available instru- 
ments, preferring to inculcate general principles that will enable 
the student to evaluate any test. Despite this approach, the 
book is in no wise overly theoretical or abstract; the principles 
typically emerge from concrete, detailed discussions of specific 
tests or studies. 

The book is divided into three parts. Part I, Basic Concepts, 
describes types of psychological tests, their purposes and their 
uses; it covers general principles of selecting and administering 
tests and includes a chapter on the elementary statistical methods 
necessary for an understanding of the rest of the volume. Parts 
II and III are concerned, respectively, with tests which Cronbach 
classifies as ‘tests of ability, or maximum performance,’ and 
‘tests of typical performance.’ According to this classificatory 
scheme—admittedly imperfect—there are treated, in Part II, 
tests usually included in the categories of intelligence, aptitude, 
special ability and achievement tests; while in Part III are dis- 
cussed personality, interest and attitude questionnaires, behavior 
ratings and the projective techniques. Except for a chapter on 
factor analysis in Part II, and chapters on general problems of 
measuring typical performance and on the use of test results in 
counseling in Part III, each chapter is concerned with a specific 
type of test. 

Each of these latter chapters follows a fairly similar pattern: an 
opening section devoted to a discussion of the function of the 
kind of test with which the chapter is concerned; certain general 
observations about the type of test; and a detailed discussion of 
one or more tests representative of the type. In the course of 
this discussion guiding principles for the evaluation of similar 
tests are educed. The Stanford-Binet, for example, is described 
as typical of Binet-type scales, Thurstone and Likert scales as 
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representative of the self-report techniques in the attitude field, 
etc. In this fashion the student is introduced to a considerable 
number of the more widely used tests and to critiques of them, 
both by the author and by others. The chapters vary in quality; 
those on Binet-type tests and diagnostic tests (represented by the 
Wechsler-Bellevue test) come much closer, in the reviewer’s judg- 
ment, to fulfilling the author’s purposes than do the chapters on 
tests of special abilities and on interest measures, which tend to 
be more nearly of the cataloging type. 

Cronbach’s approach gives rise to problems of organization. 
Thus, the chapter on tests of general ability includes a form for 
analyzing tests which can be used with equal appropriateness for 
appraisal of almost any type of test. Similarly, the chapter on 
tests of special abilities includes a section on diagnostic test 
batteries which would appear to be more nearly related to the 
tests of general ability covered in another chapter. The dis- 
cussion of the Wechsler-Bellevue test gives rise to certain general 
observations on reliability and validity of diagnostic measures 
which are equally pertinent for many of the other types of tests 
discussed. It may be that the particular location within the 
book of the discussion of a topic is not too important, provided 
that the entire volume is systematically covered during a course. 
The reviewer hazards a guess, however, that this type of treat- 
ment does not facilitate generalization of principles concerning 
validity, reliability, norms, etc. from one type of test to another. 

The reviewer found himself uneasy about some of the defini- 
tions proposed by Cronbach. Specifically, to say ‘‘a test is valid 
to the degree that we know what it measures or predicts”’ 
(page 48) seems no improvement over the usual formulation; 
nor is the definition of a test as “‘a systematic procedure for com- 
paring the behavior of two or more persons”’ (page 11) likely to 
meet with universal approval. Cronbach accepts the point of 
view that an attempt to predict underlies every use of testing, 
and his treatment of validity in several places reflects acceptance 
of this postulate. 

The book is readable by the beginning student in measure- 
ments, and the student who masters its contents will have a wide 
acquaintance with various types of tests as well as sound bases 
for evaluating psychological measures. There is an abundance— 
perhaps an over-abundance—of research data and citations from 
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the literature. Cronbach has favored the more recent literature 
(more than half of about five hundred twenty studies cited have 
been published since 1942), which fact may tend to give the 
beginning student a distorted historical perspective in the field. 

Particularly praiseworthy are the thought-provoking questions 
liberally provided throughout the text. These questions are well 
calculated to further the author’s goal of developing under- 
standing. This reviewer, however, wishes that the questions 
were not interspersed as they are in the body of the text, but were 
concentrated at the ends of the chapters. In this respect and in 
others, the designer of the book has done the author no favor; 
typographically, the book leaves much to be desired. Appen- 
dices include a list of reviews of studies of testing (from which 
mention of the AERA periodical reviews of psychological tests is 
strangely missing), and a list of test publishers and distributors. 

RoGer T. LENNON 
Division of Test Research and Service 
World Book Company 


Rosert Hoprock. Group Guidance Principles, Techniques, and 
Evaluation. New York: McGraw-Hill Book Company, Inc., 


1949, pp. 393. 


In the opening pages of this book, the author defines group 
guidance as any activity which influences an individual in making 
plans for his own future. Under this definition he includes 
orientation, educational guidance, vocational guidance, character 
education, encouragement of wholesome social attitudes, per- 
sonality development and psychotherapy. Then, making use of 
his prerogative, he limits the discussion to the first three of these 
topics. Although such a segmentation is easily accomplished for 
purposes of discussion, carried over into the guidance program, it 
weakens such a program to the point of its losing all effectiveness. 
Nowhere does Hoppock caution the reader of this danger. 

Many interesting techniques are presented with ample illus- 
trations provided in the twenty-two appendices which constitute 
more than one-third of the total volume. Some of these tech- 
niques are ones developed by the author or by his students and 
they are known to a relatively limited number of guidance work- 
ers. Other techniques are similar to those which have been used 
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by workers for many years but the descriptions of these are 
succinct and the illustrations and examples informative. 

In general, the book is perhaps too heavily weighted with these 
descriptions of techniques. Divided into three parts—prin- 
ciples, techinques and evaluations—the book has only thirty- 
eight pages devoted to principles, and of these, four are con- 
cerned with where to put guidance, fourteen with what to 
include, and four with who should teach it. The educational 
philosophy underlying a guidance program is given little heed 
and the relationships existing between group guidance and the 
remainder of the school are barely discussed. The jacket of the 
book says the text ‘‘tells the beginner what to do and how to 
do it.”” Unfortunately, it does not tell him why it should be 
done. 

The book will undoubtedly find greatest use as a text in 
beginning courses in guidance, perhaps during abbreviated sum- 
mer courses. Used in conjunction with such books as Fedder’s 
and Shartle’s, it can contribute to these courses, but a person 
who has read only this book will have a distorted idea of group 
guidance. The experienced guidance worker with sound basic 
training might find the book much more useful, for it is a source 
of practical ideas that can be used in both secondary schools and 
colleges. 

Students’ follow-up of alumni, with an interesting example of 
the follow-up of guidance and personnel workers, plant visits, 
group conferences, student survey of jobs, case conferences, 
laboratory studies, and self-measurement are some of the tech- 
niques described. For each of these techniques, the specific 
purposes are stated, the procedures described, possible variations 
discussed, and comments presented concerning the effectiveness 
of these techniques. 

Since the appearance of Roger’s book, Counseling and Psycho- 
therapy, in 1942, personnel workers have become increasingly 
aware of the value of presenting complete and accurate tran- 
scriptions of interviews and conferences. Hoppock includes 
several records of conferences, these records being based on short- 
hand notes of the discussions. They include a discussion of 
returns from a follow-up, a group conference to provide occupa- 
tional information, a case conference, and a job interview. The 
transcriptions will serve to introduce these various techniques to 
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workers not acquainted with them, but additional comments by 
the author would assist the beginning student to evaluate and 
appraise the illustrations. 

Both the beginning guidance worker and the experienced 
worker will derive much from the discussion of evaluation. First 
are presented general conclusions and specific results based on the 
available research studies. Next is presented a discussion of 
research methods and terms, with a short but necessary section 
indicating errors often made in evaluation studies. Next are 
presented, brief summaries of twenty original researches which 
were reported between 1926 and 1948. These summaries 
emphasize the fact that an evaluation study is no better than the 
measuring or evaluation instrument used, regardless of the design 
of the experiment or the controls exercised. Far too many of 
these evaluations have made use of criteria which have little 
claim to validity and this may in part explain the frequently 
reported results which indicate either little or sometimes no effect 
following guidance. 

Group Guidance will do little in helping the student or the more 
experienced worker to clarify his own guidance philosophy or to 
understand the basic purposes of a guidance program. It will, 
however, suggest techniques and methods that will increase the 
effectiveness of such a program once this understanding is 
obtained. Rap F. BEerDIE 

Student Counseling Bureau 

University of Minnesota 
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